Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

A data mining analysis of RTID alarms

2000, Computer Networks

A Data Mining Analysis of RTID Alarms Stefanos Manganaris Marvin Christensen Dan Zerkle Keith Hermiz International Business Ma hines Corporation 19 Lakehurst Court, Resear h Triangle Park, NC 27713 fstefanos,marvin ,dzerkle,khermizgus.ibm. om Abstra t IBM's Emergen y Response Servi e provides real-time intrusion dete tion (RTID) servi es through the Internet for a variety of lients. As the number of lients in reases, the volume of alerts generated by the RTID sensors be omes intra table. This problem is aggravated by the fa t that some sensors may generate hundreds or even thousands of inno ent alerts per day. With an eye towards managing these alerts more e e tively, IBM's data mining servi es group analyzed a database of RTID reports. The rst obje tive was an approa h for hara terizing the \normal" stream of alerts from a sensor. Using su h models tuned to individual sensors, we then developed a methodology for dete ting anomalies. In ontrast to many popular approa hes, the de ision to lter an alarm out or not takes into onsideration the ontext in whi h it o urred and the histori al behavior of the sensor it ame from. Our se ond obje tive was to identify all the di erent pro les of our lients. Based on their history of alerts, we dis overed several di erent types of lients, with di erent alert behaviors and thus di erent monitoring needs. We present the issues en ountered, solutions, and ndings, and dis uss how our results may be used in large-s ale RTID operations. 1 Introdu tion Organizations olle t huge volumes of data from their daily operations. This wealth of data is often under-utilized. Data mining is a pro ess of drilling through large amounts of data to dis over hidden key fa ts that an drive de ision making. Data mining helps ompanies reap rewards from their data warehouse investments, by transforming data into a tionable knowledge, revealing relationships, trends, and answers to spe i questions that are too broad in nature for traditional query and reporting tools. Knowledge dis overy and data mining is a relatively young, interdis iplinary, eld that ross-fertilizes ideas from several resear h areas, in luding ma hine learning, statisti s, databases, and data visualization. With its origins in a ademia about ten years ago, the eld has re ently aptured the imagination of the business world and is making important strides by reating knowledge dis overy appli ations in many business areas, driven by the rapid growth of on-line data volumes. Fayyad, et al., [1℄ presents a good, though somewhat dated, overview of the eld. Bigus [2℄ and Berry and Lino [3℄, among others, have written introdu tory books on data mining that in lude good des riptions of several business appli ations. Copyright 1999, IBM. All rights reserved. Commer ial intrusion-dete tion systems often generate mountains of data. However, most large-s ale enterprise operations use fairly simple lters to s reen alarms in order to ope with their sheer volume; little else is usually done with this data. Emergen y Response Servi es and the Knowledge Dis overy Consulting pra ti e at IBM work together in a pilot proje t to reveal the value hidden deep inside these databases of alarms. The rst phase of our plan, des ribed in this paper, fo used on two obje tives. First, we sought to improve anomaly dete tion. In ontrast to ommon approa hes that analyze alarms in isolation, we felt that alarm ontext ould enhan e de ision making. Moreover, we wanted to develop an adaptive approa h where the system would learn to dete t anomalies based on its experien e from the history of alarms. We expand on that work in se tion 2 of this paper. Our se ond obje tive was to understand the lients we serve better from the perspe tive of alarm histories they tend to generate. That work is des ribed in se tion 3. 2 Data Mining for Anomaly Dete tion IBM provides real-time intrusion dete tion servi es to lients world-wide. Commer ially available sensors, su h as NetRanger from Cis o Systems, are deployed on ustomer networks. All intrusion alarms are sent over the Internet to IBM's Network Operations Center (NOC) in Boulder, Colorado, whi h provides 7  24 rst-level monitoring. The database of alarms in Boulder is one of the largest known olle tions of intrusion data and has alluring potential for insights into intrusion behaviors. Operators at the NOC deal with thousands of in oming alarms from ea h sensor every day, using sophisti ated ltering and summarization tools to determine in real-time the extent and sour e of potential atta ks. Even though these tools perform admirably, su ess urrently depends riti ally on areful hand- rafting of the ltering and summarization rules. As the number of sensors in reases, the data volume rises, and this task be omes harder to keep up with. By ne essity, most manually rafted rules are fairly simple, pla ing a lot of weight on priority levels stati ally pre-assigned to di erent alarm types. Most of the rules do not orrelate alarms and do not take into a ount the ontext in whi h alarms are raised or the histori al behavior of the sensor raising the alarm. Our long-term vision for NOC is illustrated in Figure 1. Operators are assisted by an automated de ision engine, whi h s reens in oming alarms using a knowledge-base of de ision rules, whi h is updated with the assistan e of a data mining engine that analyzes histori al data and feedba k from in ident resolutions. Taking the rst steps in that dire tion, we wanted to investigate whether the \normal" stream of alarms, generated by sensors under onditions not asso iated with intrusion atta ks, an be hara terized. Moreover, given su h models of normal behavior, we wanted to develop a method that improves in ident dete tion while redu ing the volume of false positive alarms and thus also operator workload. The basi idea of our approa h is simple: frequent behavior, over extended periods of 2 alarm customer network pre-filtering alarm NOC ERS intrusion confirmed customer resolution info DB alarm alarm history intrusion suspected decision engine KB resolution info mining engine Figure 1: Vision for Network Operations Center time, is likely to be normal. A ombination of spe i alarm types o urring within se onds from ea h other, always in the same order, every few minutes, for an extended time period, is most likely less suspi ious than a sudden burst of alarms never seen before. We used asso iation analysis, in IBM's Intelligent Miner for Data toolkit [4℄, to dis over all frequent sets of alarms. The problem of mining asso iation rules was introdu ed by Agrawal, et al., in [5℄. The input onsists of a set of supermarket transa tions, where ea h transa tion is a set of literals ( alled items). An example of an asso iation rule is: \30% of transa tions that ontain beer also ontain diapers; 2% of all transa tions ontain both of these items." Here 30% is alled the on den e of the rule and 2% is the support of the rule. The problem is to nd all asso iation rules that ex eed user-spe i ed minimum support and minimum on den e thresholds. Common approa hes de ompose the problem into two subproblems: (i) nd all ombinations of items that have transa tion support above minimum support; all those ombinations frequent itemsets . A ombination of items X has support s in the transa tion set Y , if s% of the transa tions in Y ontain X . (ii) use frequent itemsets to generate the desired rules. The idea is that if, say, ABCD and AB are frequent itemsets, then AB CD is an asso iation rule of the solution i the ratio of ABCD support over AB support ex eeds the minimum on den e threshold. The rst subproblem is omputationally more demanding and has been the fo us of onsiderable work on developing fast algorithms (e.g., Agrawal, et al., [6℄, Brin, et al., [7℄). ! 3 We hara terized normal alarm behavior in terms of frequent itemsets and asso iation rules as follows. First, the ontinuous stream of alarms was partitioned into bursts of alarms. Ea h burst orresponds to a transa tion, in the lingo of asso iation analysis, and items refer to alarms. Alarm bursts were identi ed by larger than average interalarm times at their start and end points1. Frequent itemsets refer to ombinations of alarms that tend to o ur often within bursts. Asso iation rules relate the o urren e of one set of alarms with another in the same burst. The model of normal behavior for a sensor onsisted of the olle tion of frequent itemsets and asso iation rules with high on den e. The minimum support and on den e thresholds were hosen empiri ally; more on this later. In this work, the ordering of alarms within bursts was not modeled. The same approa h, however, an be taken using sequential pattern analysis [8, 9℄. Sequential patterns refer to frequent alarm sequen es rather than frequent sets. We believe there are ases where ordering over time is of signi an e, and we are urrently investigating an extension in that dire tion. The algorithm for dete ting deviations from normal behavior is shown in Figure 2. Given a set of frequent itemsets, high on den e rules, and an in oming burst of alarms, we rst he ked whether the set of alarms is a known frequent itemset. In that ase, this is likely an inno ent set of alarms, with expe ted frequen y of o urren e equal to the itemset's support. Failing that, we identi ed the set of most spe i supported itemsets M . A frequent itemset is supported by a burst of alarms, when it is a subset of the alarms in the burst. A supported frequent itemset is most spe i when no other frequent itemset that is a superset of it is also supported. The set M shows ombinations of alarms that are known to o ur, independently of ea h other, frequently within bursts, but are not known to o-o ur. This is one type of an anomaly where the ontext in whi h alarms o ur is of importan e. The set D ontains any alarms not overed by any of the patterns in M . When D is not empty, we have a burst that ontains alarms that o ur with frequen y lower than the minimum support threshold in any ontext. Another type of anomaly involves the unexpe ted absen e of alarms. This is where asso iation rules ome into play ( f. he k rules in Figure 2). Any high- on den e rule A ! C su h that A is supported in burst B while C is not, is indi ating that the set of alarms C , whi h was known to frequently o ur in the parti ular ontext, is unexpe tedly missing. A measure of interestingness for this anomaly is the probability of the event, 1 on den e(A ! C ). To set the minimum support threshold, take into onsideration the frequen y of any known inno ent alarms. The higher the threshold, the fewer the frequent itemsets, and thus the fewer the alarm bursts that we an assign a pre ise estimate on the frequen y of 1 We are investigating a more sophisti ated approa h based on box fra tal dimensions in the analysis of dynami ounting, often used for omputing systems, whi h avoids the need for an arbitrary threshold on interalarm times. 4 input: output: LOOP END model of normal sensor behavior (frequent sets F and rules R) stream of alarm bursts from sensor alarm bursts that deviate from normal behavior B = read next burst of alarms IF (9s 2 F : s = B ) THEN he k rules(R), report frequent(B ) ELSE M = most spe i supported itemsets(B; F ) D = B [mi 2M mi he k rules(R), report infrequent(B ; M ; D) END Figure 2: Algorithm for Anomaly Dete tion o urren e. More and more bursts would be agged as anomalies, raising the rate of false positives while lowering the rate of false negatives. Regarding the minimum on den e threshold, higher values produ e fewer, more on dent, rules. Broken high- on den e rules are more interesting than broken low- on den e rules. The higher the on den e threshold, the fewer bursts are agged as anomalies be ause of broken rules, lowering the rate of false positives and raising the rate of false negatives. We tested these ideas o -line on two NetRanger sensors with good results, and urrently plan an on-line evaluation with a more rigorous methodology. NetRanger is a misuse dete tion system that works by omparing network traÆ against a database of signatures of known behavior that may indi ate an atta k. Upon spotting su h behavior, NetRanger generates an alarm, showing the sour e and destination address of the network traÆ that aused it, the type of alarm, and the time and date issued. Ea h alarm type is given a priority level from one to ve; high level alarms are more likely to be of high on ern. For the preliminary results we report here, we used two weeks worth of data for training normal behavior models, and tested on the following week. For one of the sensors, indi atively, we had 393 thousand alarms to train on, with 107 distin t alarm types, in about nine thousand bursts of between one and 47 distin t alarm types per burst. The derived model of normal behavior onsisted of about 600 frequent itemsets and 850 rules (minimum support of 1% and on den e of 98%). During testing, we pro essed 170 thousand alarms and agged 314 anomalies of various types. After fo using on anomalies that (i) have low support and (ii) ontain alarms not frequently o urring in the ontext of the 5 agged burst, their number is redu ed signi antly, to about ten per day per sensor on average. We ompared this redu ed number of anomalies with the in idents re orded on the all-log maintained by the operators at the NOC, and found that (i) we had dete ted ea h of the in idents re orded on the all-log, and (ii) we had agged an additional three to eight anomalies per sensor per day that had gone undete ted at the NOC, when they ould have provided earlier warning and in reased robustness. Asso iation analysis has been used previously for intrusion dete tion. Lee et al. [10℄ present a data mining framework for indu ing on ise and intuitive lassi ation rules that an dete t intrusions. At the ore of their framework is a lassi er that an be trained to dis riminate between normal and other intrusion behaviors. The su ess of a lassi er system in this task depends riti ally on (i) having suÆ ient data to over the behaviors of interest and (ii) on engineering the right set of features to des ribe instan es of behavior. In their approa h, asso iation rules and frequent episodes are omputed from audit data as the basis for guiding the audit data gathering and feature sele tion pro esses. Wespi et al. [11℄ also present a behavior-based approa h for intrusion dete tion. The system looks for patterns in audit data, whi h are then used as models of normal behavior. Patterns are indu ed by the Teiresias algorithm [12℄. After training, a pattern mat hing algorithm ompares the observed behavior to the stored patterns. When the quality of the mat h deteriorates, a deviation is agged. Di eren es between the two works exist in (i) what onstitutes a pattern and how they are derived and (ii) how patterns are used for intrusion dete tion. Teiresias analyzes strings over hara ters of an alphabet and identi es all rigid patterns that repeat at least K times, for some value of K . Patterns are guaranteed to be maximal in both length and omposition. Patterns are strings that may ontain an arbitrary ombination of \don't are" hara ters. As applied for anomaly dete tion, Teiresias analyzed the sequen e of system alls generated by a Unix pro ess. Common invo ations of the same pro ess exhibit ertain patterns in the sequen e of alls. Intrusions are assumed to exer ise abnormal paths in the exe utable ode and the sequen e of alls does not mat h the expe ted patterns. 3 Sensor Pro ling Ea h sensor has its own history of alarm behavior; sensors vary in terms of alarm types, alarm rates, distribution of alarms over day-of-week and time-of-day, et . Can sensor behaviors an be assigned to general sensor pro les? The question we set out to explore was whether lients lustered into natural groupings based on the similarity of alarm histories from their sensors. Moreover, we wanted to pro le the various types of alarm behavior and orresponding segments of ustomers. Our motive behind this work is deeper understanding; insights in the behaviors of alarms, sensors, omputer networks, ustomers, industries, and geographies. If indeed there are 6 distin t segments of alarm behavior, we an mu h more easily put our arms around the set of sensors monitored and begin tailoring servi ing strategies at the segment-level rather than the individual sensor-level. Moreover, on e segments have been pro led, one an investigate fa tors that may a e t alarm behavior. For example, is industry or geography a fa tor? In what ways do they a e t alarm behaviors? Insights regarding behavior, often lead to insights regarding ustomer needs and thus opportunities for di erentiation from the ompetition. In marketing, the partitioning of a population of ustomers based on riteria that disriminate their wants and needs is alled market segmentation . There are several examples in the knowledge dis overy literature where mining te hniques have been applied to this problem with good results. We are not aware, however, of related prior work in the segmentation of ustomers based on the alarm histories of their intrusion dete tion sensors. We illustrate our work on a sample of 27 NetRanger sensors orresponding to 25 unidenti ed lients. We monitored these sensors for a period of about one month and olle ted roughly 12  106 alarms, orresponding to about 2GB of data. Ea h alarm was des ribed in terms of its type, priority level, time stamp, sour e and destination IP address and port, and ustomer and sensor ID. The stream of alarms from ea h sensor was pro essed to generate a summary ve tor of its behavior over time that in luded the following attributes: (i) alarm volumes and rates by priority level and a ross levels, (ii) ratios of alarm volumes by level, (iii) aggregate properties of the interalarm times (su h as min, max, median), (iv) number of distin t alarm types en ountered by level and a ross levels, (v) ten top most frequent alarm types by level and a ross levels and orresponding relative frequen ies, (vi) volume ratios by day of week, (vii) volume ratios by time of day, (viii) number of distin t IP ports and addresses and ten top most frequent IP ports and addresses, en ountered as sour e, destination, and sour e/destination pairs, (ix) entropies of the probability distributions for time-of-day, dayof-week, sour e and destination port, address, and sour e/destination pair. Ea h entropy attribute provides an indi ation of how random the orresponding distribution is. There are no deep prin iples to justify this hoi e of attributes; we felt, however, they provided good summaries of alarm histories at the sensor level. The population of summary ve tors was segmented based on various attributes using demographi lustering in IBM's Intelligent Miner for Data toolkit [4℄. This is an iterative algorithm that makes multiple passes over the set of ve tors before onverging to a lo ally optimal lustering. The quality of a partitioning is assessed by a global measure, alled the Condorset riterion, whi h favors lusterings with high intra lass ve tor similarity and low inter lass ve tor similarity. In other words, it favors lusters that ontain similar ve tors when, at the same time, ve tors assigned to di erent lusters are dissimilar. At ea h step of the pro ess, the algorithm uses the Condorset riterion to de ide whether to assign a ve tor into an existing luster, or whether to reate a new one. The pro ess ends when an iteration results in no hanges to the lustering. In ontrast to many popular te hniques, 7 demographi lustering does not require an a priori spe i ation of the number of lusters to produ e. It an also easily handle mixed numeri and ategori al data. Our results indi ated that our sensors fell indeed into a small number of ategories with distin t hara teristi s. A segmentation based on sensor lo ation and alarm volumes, rates, and variety showed one luster en ompassing about 78% of the sensors, orresponding to the \average" well-tuned sensor on a typi al orporate network looking at Internet or DMZ traÆ . More interestingly, it also showed four other smaller segments. One orresponds to sensors looking at intranet traÆ , one grouped together the sensors of a parti ular lient, suggesting a very idiosyn rati network, and another seems to have in luded sensors with behavior that ould improve by tuning, as indi ated by the unusually high rate of low level alarms that dis riminated that segment from the rest. The demographi lustering algorithm assigns sensors to lusters with a ertain degree of tness. Low degrees of tness indi ate unusual behavior ompared to peer sensors. Closer examination of sensors that stood out from the rest in our example revealed more opportunities for tuning. Pro ling segments in terms of alarm volume by time-of-day and day-of-week shed light on the pro les of demand pla ed on our entral monitoring enter by various types of sensors. This is an example where the results of segmentation an lead into a re nement of servi ing strategy. Another bene t we are urrently investigating is the in orporation of a frame of referen e in our reports to ustomers. A segment's pro le an serve as frame of referen e, against whi h we an ontrast the behavior of a ustomer's sensors. Rather than present alarm statisti s and trends for a ustomer in isolation, we anti ipate presenting them in the ontext of orresponding information from the unidenti ed ustomer's peers. Su h ontrasts an be used to justify re ommendations for hanges in se urity-related areas. 4 Con lusions We have presented a new approa h for dealing with some of the problems of intrusion dete tion. Some systems use pattern mat hing for misuse dete tion while others use anomaly dete tion [13℄. Both approa hes have their advantages and disadvantages. In this paper, we have attempted to ombine these approa hes, by performing anomaly dete tion on the voluminous results produ ed by a misuse dete tion system. We have found our preliminary results en ouraging: normal alarm behavior models allowed us to understand omplex se urity-related a tivity in a omputer network and, more importantly, allowed an automated system to ignore the enormous volume of uninteresting alarms. Our analysis of the pro les of di erent IBM ustomers' alarm traÆ also produ ed interesting results. It showed that the behaviors of sensors on di erent networks varied, but still did exhibit ommonalities. The sensor pro les we indu ed will allow us to present 8 our ustomers with useful omparative insights. The demonstration of the di eren es in behavior provides proof that ea h ustomer must have his monitoring servi e ustom-tuned in order to provide the best results. Our next step is to integrate our anomaly dete tion te hnique into a real-time system so that results an be made available immediately to the Network Operations Center and IBM's ustomers. This will require adapting an infrastru ture to dire t the alarms and analysis results to appropriate omponents. It will also require the development of a dynami retraining system, to deal with the hanges in behavior that will o ur over time as the monitored networks hange. Referen es [1℄ Usama M. Fayyad, Gregory Piatetsky-Shapiro, and Padhrai Smyth. From data mining to knowledge dis overy: An overview. In Fayyad et al. [14℄, hapter 1. Data Mining with Neural Networks. M Graw-Hill, 1996. Mi hael J. A. Berry and Gordon Lino . Data Mining Te hniques: For Marketing, Sales, and Customer Support. John Wiley & Sons, 1997. IBM. Intelligent Miner for Data: User's Guide, 1996. [2℄ Joseph P. Bigus. [3℄ [4℄ [5℄ R. Agrawal, T. Imielinski, and A. Swami. Mining asso iation rules between sets of items in large databases. In Pro . of the ACM SIGMOD Conf. on Management of Data, pages 207{216, 1993. [6℄ R. Agrawal, H. Mannila, R. Srikant, H. Toivonen, and A. I. Verkamo. Fast dis overy of asso iation rules. In Fayyad et al. [14℄, hapter 12, pages 307{328. [7℄ S. Brin, R. Motwani, J. D. Ullman, and S. Tsur. Dynami itemset ounting and impli ation rules for market basket data. In Pro . of the ACM SIGMOD Conf. on Management of Data, 1997. [8℄ R. Agrawal and R. Srikant. Mining sequential patterns. In Pro Data Engineering (ICDE), Taipei, Taiwan, Mar h 1995. . of the Intl. Conf. on [9℄ H. Mannila, H. Toivonen, and A. I. Verkamo. Dis overing frequent episodes in sequen es. In Pro . of the First Intl. Conf. on Knowledge Dis overy & Data Mining, Monrteal, Canada, August 1995. [10℄ Wenke Lee, Salvatore J. Stolfo, and Kui W. Mok. Mining audit data to build intrusion dete tion models. In Pro . of the Fourth Intl. Conf. on Knowledge Dis overy & Data 9 Mining, pages 66{72, New York, NY, August 1998. Ameri an Asso iation for Arti ial Intelligen e. [11℄ Andreas Wespi, Mar Da ier, Herve Debar, and Mehdi M. Nassehi. Audit trail pattern analysis for dete ting suspi ious pro ess behavior. In Pro . of the First Intl. Workshop on Re ent Advan es in Intrusion Dete tion, Louvain-la-Neuve, Belgium, September 1998. [12℄ Isidore Rigoutsos and Aris Floratos. Motif dis overy without alignment or enumeration. In Pro . of the Se ond Annual ACM Intl. Conf. on Computational Mole ular Biology (RECOMB '98), New York, NY, Mar h 1998. [13℄ B. Mukherjee, L. T. Heberlein, and K. N. Levitt. Network intrusion dete tion. Network, 8(3):26{41, May 1994. IEEE [14℄ Usama M. Fayyad, Gregory Piatetsky-Shapiro, Padhrai Smyth, and Ramasamy Uthurusamy, editors. Advan es in Knowledge Dis overy and Data Mining. Ameri an Assoiation for Arti ial Intelligen e, 1996. 10 Stefanos Manganaris Stefanos Manganaris is a senior onsultant in the Knowledge Dis overy Consulting group at IBM. He is responsible for data mining solutions in onsulting engagements, edu ation and te hnology transfer on knowledge dis overy issues, and resear h in ma hine learning and statisti al pattern re ognition with appli ations to business problems a ross industries. Dr. Manganaris earned a Ph.D. in Computer S ien e at Vanderbilt University. Marvin Christensen Marvin J Christensen is manager of Corporate Emergen y Response Servi es (ERS) at IBM. He is responsible for vulnerability testing, omplian e testing, and providing the infrastru ture support for ERS's Real-Time Intrusion Dete tion o ering. Mr. Christensen is responsible for the development and implementation of an open ar hite ture RTID monitoring system that supports multiple ommer ially available RTID systems. He is also responsible for in ident handling of Internet omputer se urity in idents for IBM. Mr. Christensen earned his Masters Degree in Computer S ien e at the University California in Davis. Dan Zerkle Dan Zerkle is a senior Internet se urity analyst from IBM's Emergen y Response Servi e, spe ializing in vulnerability analysis and in intrusion dete tion infrastru ture support. He earned his M.S. in Computer S ien e from the University of California at Davis. Keith Hermiz Keith Hermiz serves as prini pal of the Knowledge Dis overy Consulting group within IBM's Global Business Intelligen e Solutions division. He re eived a Ph.D. in Applied Mathemati s from the University of Maryland, College Park. 11