Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Facultes Universitaires Notre-Dame de la Paix Namur, Belgium Institut d'Informatique Rue Grandgagnage, 21 B-5000 Namur BELGIUM Distributed Audit Trail Analysis Abdelaziz Mounji, Baudouin Le Charlier, Denis Zampunieris, Naji Habra RP-94-007 Phone: +32 81 72.49.66 November1994 Fax: +32 81 72.49.67 E-mail: cleroy@info.fundp.ac.be Distributed Audit Trail Analysis Abdelaziz Mounji Baudouin Le Charlier Naji Habra, Denis Zampunieris Institut d'Informatique, FUNDP, rue Grangagnage 21, B-5000 Namur Belgium E-mail: famo, ble, dza, nhag@info.fundp.ac.be 15 November 1994 Abstract networks leads to more elaborate patterns of attacks. Previous works for stand-alone computer security have established basic concepts and models [3, 4, 5, 7, 8] and described a few operational systems [1, 6, 9, 12, 18]. However, distributed analysis of audit trails for network security is needed because of the two following facts. First, the correlation of user actions taking place at di erent hosts could reveal a malicious behavior while the same actions may seem legitimate if considered at a single host level. Second, the monitoring of network security can potentially provide a more coherent and exible enforcement of a given security policy. For instance, the security ocer can set up a common security policy for all monitored hosts but choose to tighten the security measures for critical hosts such as rewalls [2] or for suspicious users. An implemented system for on-line analysis of multiple distributed data streams is presented. The system is conceptually universal since it does not rely on any particular platform feature and uses format adaptors to translate data streams into its own standard format. The system is as powerful as possible (from a theoretical standpoint) but still ecient enough for on-line analysis thanks to its novel rule-based language (RUSSEL) which is speci cally designed for ecient processing of sequential unstructured data streams. In this paper, the generic concepts are applied to security audit trail analysis. The resulting system provides powerful network security monitoring and sophisticated tools for intrusion/anomaly detection. The rulebased and command languages are described as well as the distributed architecture and the implementation. Performance measurements are reported, showing the e ectiveness of the approach. 1 Introduction A software architecture and a rule-based language for universal audit trail analysis were developed in the rst phase of the ASAX project [10, 11, 12]. The distributed system presented here uses this rule-based language to lter audit data at each monitored host and to analyze ltered data gathered at a central host. The analysis language is exactly the same at both local and central levels. This provides a tool for a exible and a gradual granularity control at di erent levels: users, hosts, subnets, domains, etc. Auditing distributed environments is useful to understand the behavior of the software components. For instance this is useful for testing new applications: one execution trace can be analyzed to check the correctness wrt the requirements. In the area of real-time process control, critical hardware or software components are supervised by generating log data describing their behavior. The collection and analysis of these log les has often to be done real-time, in parallel with the audited process. This analysis can be conducted for various purposes such as investigation, recovery and prevention, production optimization, alarm and statistics reporting. In addition, correlation of results obtained at di erent nodes can be useful to achieve a more comprehensive view of the whole system. Computer and network security is currently an active research area. The rising complexity of today The rest of this paper is organized as follows. Section 2 brie y describes the system for single audit trail analysis and its rule-based language. Section 3 details the functionalities o ered by the distributed system. Section 4 presents the distributed architecture. Section 5 describes the command interface of the security ocer. In section 6, the implementation of the main components is outlined. Performance measurements are reported in section 7. Finally, section 8 contains the conclusion and indicates possible improvements of this work.  To appear in the ISOC' 95 Symposium on Network and Distributed System Security. 1 2 Single Audit Trail Analysis In this section, the main features of the stand alone version of ASAX for single audit trail analysis are explained. However, we only emphasize interesting functionalities. The reader is referred to [12] for a more detailed description of these functionalities1. A comprehensive description of ASAX is presented in [10, 11]. 2.1 A motivating example The use of the RUSSEL language for single audit trail analysis is better introduced by a typical example: detecting repeated failed login attempts from a single user during a speci ed time period. This example uses the SunOS 4.1 auditing mechanism. Native audit trails are translated into a standard format (called NADF). The translation can be applied on-line or o -line. Hence, the description below is based on the NADF format of the audit trail records. Assuming that login events are pre-selected for auditing, every time a user attempts to log in, an audit record describing this event is written into the audit trail. Audit record elds (or audit data) found in a failed login record include the time stamp (au time), the user id (au text 3) and a eld indicating success or failure of the attempted login (au text 4). Notice that audit records representing login events are not necessarily consecutive since other audit records can be inserted for other events generated by other users of the system. In the example (see Figure 1), RUSSEL keywords are noted in bold face characters, words in italic style identify elds in the current audit record, while rule parameters are noted in roman style. Two rules are needed to detect a sequence of failed logins. The rst one (failed login) detects the rst occurrence of a login failure. If this record is found, this rule triggers o the rule count rule which remains active until it detects count down failed logins among the subsequent records or until its expiration time arrives. The parameter target uid of rule count rule is needed to count only failed logins that are issued by the same user (target uid). If the current audit record does not correspond to a login attempt from the same user, count rule simply retriggers itself for the next record otherwise. If the user id in the current record is the same as its argument and the time stamp is lower than the expiration argument, it retriggers itself for the next record after decrementing the count down argument. If the latter drops to zero, count rule writes an alarm message to the screen indicating that a given user has performed maxtimes unsuccessful logins within the period of time duration seconds. In addition, count rule retriggers the failed login rule in order to search for other similar patterns in the rest of the audit trail. In order to initialize the analysis process, the special rule init action makes the failed login rule active for the rst record and also makes the print results rule active at completion of the analysis. The latter rule is 1 Notice however that [12] is a preliminary description of a system under implementation. The examples in the present paper have been actually run on the implemented system global v: integer; rule failed login(max times, duration: integer); if event = 'login logout' and au text 4 = 'incorrect password' --> trigger o for next count rule(au text 3, strToInt(au time)+duration, max times-1) ; rule if count rule(target uid: expiration, count down: auid string; integer); = suspect auid and event = 'login logout' and au text 4 = 'incorrect password' and au text 3 = target uid and strToInt(au time) < expiration --> if count down > 1 --> trigger o for next count rule(target uid, expiration, count down-1); count down = 1 --> v := v + 1; println(gettime(au time), ': 3 FAILED LOGINS ON ', target uid); begin trigger o for next failed login(3,120) end ; strToInt(au --> time) > expiration failed login(3,120); trigger o for next true --> trigger o for next count rule(target uid, expiration, count down) ; rule print results; begin println(v, ' sequence(s) of bad logins found') end; init action; begin v := 0; trigger o for next failed login(3, 120); trigger o at completion print results end. Figure 1: SunOS 4.1 RUSSEL module for failed login detection on used to print results accumulated during the analysis such as the total number of detected sequences. 2.2 Salient features of ASAX 2.2.1 Universality This feature means that ASAX is theoretically able to analyze arbitrary sequential les. This is achieved by translating the native le into a format called NADF (Normalized Audit Data Format). According to this format, a native record is abstracted to a sequence of audit data elds. All data elds are considered as untyped strings of bytes. Therefore, an audit data in the native record is converted to three elds2 : an identi er (a 2-bytes integer) identi es the data eld among all possible data elds; a length (a 2-bytes integer;) a value i.e., a string of bytes. A native record is encoded in NADF format as the sequence of encodings of each data eld with a leading 4-bytes integer representing the length of the whole NADF record. Note that the NADF format is similar to the TLV (Tag, Length, Value) encoding used for the BER (Basic Encoding Rules) which is used as part of the Abstract Syntax Notation ASN.1 [14]. However, the TLV encoding is more complex since it supports typed primitive data values such as boolean, real, etc as well as constructor data types. Nevertheless, any data value can be represented as a string of bytes in principle. As a result, the exibility of the NADF format allows a straightforward translation of native les and a fast processing of NADF records by the universal evaluator. 2.2.2 The RUSSEL language RUSSEL (RUle-baSed Sequence Evaluation Language) is a novel language speci cally tailored to the problem of searching arbitrary patterns of records in sequential les. The built-in mechanism of rule triggering allows a single pass analysis of the sequential le from left to right. The language provides common control structures such as conditional, repetitive, and compound actions. Primitive actions include assignment, external routine call and rule triggering. A RUSSEL program simply consists of a set of rule declarations which are made of a rule name, a list of formal parameters and local variables and an action part. RUSSEL also supports modules sharing global variables and exported rule declarations. The operational semantics of RUSSEL can be sketched as follows:  records are analyzed sequentially. The analysis of the current record consists in executing all active rules. The execution of an active rule may trigger o new rules, raise alarms, write report messages or alter global variables, etc; 2 In fact, native les can be translated to NADF format in many di erent ways depending on the problem at hand. The standard method proposed here was however sucient for the applications we have encountered so far. rule triggering is a special mechanism by which a rule is made active either for the current or the next record. In general, a rule is active for the current record because a pre x of a particular sequence of audit records has been detected. (The rest of this sequence has still to be possibly found in the rest of the le.) Actual parameters in the set of active rules represent knowledge about the already found subsequence and is useful for selecting further records in the sequence;  when all the rules active for the current record have been executed, the next record is read and the rules triggered for it in the previous step are executed in turn;  to initialize the process, a set of so-called init rules are made active for the rst record. User-de ned and built-in C-routines can be called from a rule body. A simple and clearly speci ed interface with C allows to extend the RUSSEL language with any desirable feature. This includes simulation of complex data structures, sending an alarm message to the security ocer, locking an account in case of outright security violation, etc. 2.2.3 Eciency Is a critical requirement for the analysis of large sequential les, especially when on-line monitoring is involved. RUSSEL is ecient thanks to its operational semantics which exhibits a bottom-up approach in constructing the searched record patterns. Furthermore, optimization issues are carefully addressed in the implementation of RUSSEL: for instance, the internal code generated by the compiler ensures a fast evaluation of boolean expressions and the current record is pre-processed before evaluation by all the current rules, in order to provide a direct access to its elds.  3 Administrator Minded Functionalities 3.1 Introduction The previous sections showed that ASAX is a universal, powerful and ecient tool for analyzing sequential les, in general, and audit trails, in particular. In this section, the functionalities of a distributed version of ASAX are presented in the context of distributed security monitoring of networked computers. The implemented system applies to a network of SUN workstations using the C2 security feature and uses PVM (Parallel Virtual Machine) [15] as message passing system. However, the architecture design makes no assumption about the communication protocol, the auditing mechanism or the operating system of the involved hosts. 3.2 Single point administration In a network of computers and in the context of security auditing, it is desirable that the security ocer has control of the whole system from a single machine. The distributed on-line system must be manageable from a central point where a global knowledge about the status of the monitoring system can be maintained and administered in a exible fashion. Management of the monitoring system involves various tasks such as activation of distributed evaluators and auditing granularity control. Therefore, monitored nodes are, in a sense, considered as local objects on which administration tasks can be applied in a transparent way as if they were local to the central machine. the other hand, if the master evaluator fails, the distributed analysis can be resumed from an other available host. In all cases, and especially for on-line analysis, all generated audit records must remain available for analysis (no records are lost). Distributed analysis recovery must also be done in a exible way and require a minimum e ort. Local analysis requirement corresponds to the ability of analyzing any audit trail associated to a monitored host. This is achieved by applying an appropriate RUSSEL module to a given audit trail of a given host. The analysis is considered local in the sense that analyzed audit data represents events taking place at the involved host. No assumption is otherwise made about which host is actually performing the analysis. Local analysis is also called ltering since at the network level, it serves as a pre-selection of relevant events. In fact, pre-selected events may correspond to any complex patterns of subject behaviors. Audit records ltered at various nodes are communicated to a central host where a global (network level) analysis takes place. In its most interesting use, global analysis aims at detecting patterns related to global network security status rather than host security status. In this regard, global analysis encompasses a higher level and a more elaborate notion of security event. Concerted local and global analysis approach lends itself naturally to a hierarchical model of security events in which components of a pattern are detected at a lower level and a more aggregate pattern is derived at the second higher level and so on. Note that an aggregate pattern could exhibit a malicious security event while corresponding sub-patterns do not at all. For instance, a login failure by a user is not an outright security violation but the fact that this same user is trying to connect to an abnormally high number of hosts may indicate that a network attack is under course. Organizations often use networks of interconnected Lans corresponding to departments. The hierarchical model can be mapped on the organization hierarchy by applying a distributed analysis on each of the Lans and an organization-wide analysis carried out on audit data ltered at each Lan. Thus, concerted ltering and global analysis can lead to the detection of very complex patterns. In the following, the node performing the global analysis is called the central or master machine while ltering takes place at slave machines. Correspondingly, we will also refer to master and slave evaluators. A distributed evaluator is a master evaluator together with its associated slave evaluators. This functionality involves control of the granularity of security events at the network, host and user levels. Typically, the security ocer must be able to set up a standard granularity for most audited hosts and to require a ner granularity for a particular user or all users of a particular host. According to the single point administration requirement, this also means that logging control is carried out from the central machine without need for multiple logging to remote hosts. 3.3 The local and global analyses 3.4 Availability This requirement means that a distributed evaluator must survive any of its slave evaluators failure and must easily be recovered in case of a failure of the master evaluator. The availability of a distributed evaluator ensures that if for some reasons a given slave is lost (broken connection, fatal error in the slave code itself, node crash, etc), the distributed analysis can still be carried on the rest of monitored hosts. On 3.5 Logging control 4 Architecture The architecture of the distributed system is addressed at two di erent levels. At the host level, a number of processes cooperate to achieve logging control and ltering. The global architecture supports the network level analysis. This section aims at giving an intuitive view of the overall distributed system. 4.1 Host level Processes in the local architecture are involved in the generation of audit data, control of its granularity level, conversion of audit data to NADF format, analysis of audit records and nally transmission of ltered sequences to the central evaluator. At the master host, a network level analysis subsequently takes place on the stream of records resulting from merging records incoming from slave machines. Both global and local analyses are performed by a slightly modi ed version of the analysis tool outlined in the previous section. 4.1.1 Audit trail generation This mechanism is operating system dependent. It generates audit records representing events such as operations on les, administrative actions, etc. It is assumed that all monitored hosts provide auditing capabilities and mechanism for controlling granularity level. The process generating audit records is called the audit daemon (auditd for short). 4.1.2 Login controller This process communicates with auditd in order to alter the granularity. It is able to change the set of preselected events. This can be done on a user, host and network basis. Furthermore, we distinguish between a temporary change which applies to the current login session and a permanent change a ecting also all subsequent sessions. 4.1.3 Format adaptor This process translates audit trails generated by auditd to the NADF format. Native les can be erased after being converted since they are semantically redundant with NADF les. Keeping converted les instead of native les has several advantages: the les are converted only once and can be reanalyzed several times without requiring a new conversion. Moreover, FILTERING FILTERING Format Adaptor Format Adaptor ? ? . . . NADF  FILE   FILE  ? ? Evaluator   ? ? NADF   ? @@ R @ ? ? ... . . . NADF  FILE   FILE  ? ? Evaluator   ? ? NADF   ? ... ... ... . ... ... ... . . . . .... NETWORK )   Audit Records ..? GLOBAL ANALYSIS ? CENTRAL EVALUATOR CONSOLE Figure 2: System Architecture in the context of an heterogeneous network, they provide a standard and unique format. 4.1.4 Local evaluator It analyzes the NADF les generated by the format adaptor. Note that several instances of the evaluator can be active at the same time to perform analyses on di erent NADF les or possibly on the same le. O line and on-line analyses are implemented in the same way. The only di erence is that in on-line mode, the evaluator analyzes the currently generated le. These processes will be further described in section 6. Audit records ltered by slave evaluators on the various monitored slave machines are sent to the central machine for global analysis. 4.2 Network level At the network level, the system consists of one or more slave machines running the processes previously described and a master machine running the master evaluator (see Figure 2). The latter performs global analysis on the audit record stream resulting from local ltering. The result of the central analysis can be network security status reports, alarms and statistics, etc. In addition, a console process is run on the master machine. It provides an interactive command interface to the distributed monitoring system. This command interface is brie y described in the next section. 5 The Command Language of the Distributed System 5.1 Preliminaries This section presents the command interface used by the security ocer. In the following, evaluator instances are identi ed by their PVM instance numbers which are similar to Process Ids in UNIX systems. Auditable events are determined by a comma separated list of audit ags which are borrowed from the SunOS 4.1 C2 security notation for event classes. (The SunOS 4.1 C2 security features are described in detail in [16].) These audit ags are listed in Table 1. ags short description data read data write object create/delete object access change Login, Logout Administrative operation Privileged operation Unusual operation dr dw dc da lo ad p0 p1 example stat(2) utimes(2) mkdir(2) chmod(2) login(1) su(1) quota(1) reboot(2) Table 1: SunOS C2 security audit flags Audit ags can optionally be preceded by + (resp. -) to select only successful (resp. failed) events. For instance, the list of audit ags +dr,-dw,lo,p0,p1 speci es that successful data read, failed data writes, all new logins and all privileged operations are selected. Under SunOS, the le /etc/security/audit/audit control contains (among other things) a list of audit ags determining the set of auditable events for all users of the system. /etc/security/passwd.adjunct is called the shadow le and contains a line per user which indicates events to be audited for this particular user. The actual set of auditable events for a given user is derived from the system audit value and the user audit value according to some priority rules. Finally, audit trails in NADF format respect naming conventions based on creation and closing times. This allows to easily select les generated during a given time interval. For instance, the le named timei.timef .NADF contains events generated by auditd in the time interval [timei, timef ]. Supported commands fall into two categories: 5.2 Analysis control commands The commands for distributed analysis allow to start, stop and modify a distributed analysis. To start a new distributed analysis on a set of monitored hosts, one rst prepares a text le specifying the involved hosts, the RUSSEL modules to be applied on the hosts, and optionally an auditing period (a time interval which is the same for each node). By default, analysis is performed on-line. This le is given as an argument to the run command. Using the rerun command, the security ocer can change attributes of an active distributed evaluator either by changing rule modules on some hosts (master or slave) or by changing the time interval used by the whole distributed evaluator. The rerun command is parameterized by an evaluator instance number and a rule module or a time interval. The kill command stops an evaluator identi ed by its instance number. ps reports the attributes of all active distributed evaluators. Attributes of an evaluator include instance number, instance number of the corresponding master evaluator, host name, rule module and time interval. It is possible to activate several distributed evaluators which run independently of each others. The command reset stops all current distributed evaluators. 5.3 Logging control commands The command logcntl implements the logging control functionality (see 3.5). It allows to alter the granularity level for any monitored user or host. The security ocer is so able to change the auditable events for a particular user on a particular host according to the list of audit ags supplied to logcntl. With the option -t the change takes e ect immediately, however, the settings are in e ect only during the current login session. With the option -p, the change takes e ect the next time the user logs in and for every subsequent login session until this command is invoked again. On the host basis, the security ocer can alter the system audit value of a speci ed host by supplying a host name and a list of audit ags. Although the logcntl command relies closely on the SunOS formalism for specifying auditable events and altering the set of events currently audited, it could be possible to develop a system-independent event classi cation as well as a portable auditing con guration. Nevertheless, as the SunOS 4.1 uses an event classi cation and auditing con guration that are similar to most O.S, the current solution is sucient for a prototype system. 5.4 Example In this section, the failed login detection example introduced in section 2.1 is reconsidered in the context of a distributed analysis. The purpose is still to detect repeated failed login attempts, but now failed login events can occur at any of the monitored hosts (here we consider two hosts viz. poireau and epinard). According to the ltering/global analysis principle, a slave evaluator is activated on each hosts (poireau and epinard) and a master evaluator is initiated on poireau. Each slave evaluator only lters failed login records from its local host and sends it to the master evaluator which then analyzes the ltered record stream to detect the sequence of failed logins. As indicated in the evaluator description le shown in Figure 3, ltering is implemented in RUSSEL by the rule module badlogin.asa while the sequence of failed logins is detected using the rule module nbbadlogin.asa. This le also contains the time interval to which analysis is applied. Figures 4 and 5 depict the content of badlogin.asa and nbbadlogin.asa respectively. Notice that the master evaluator does not check that records correspond to login failure events since master poireau: nbbadlogin:[19940531170431,19940601173829]; slaves poireau, epinard: badlogin. Figure 3: Distributed Analysis Description File rule failed login; begin if event = 'login logout' and au text 4 = 'incorrect password' --> send current ; trigger o for next end; init action; begin trigger o for next end. failed login failed login Figure 4: Slave evaluator module: badlogin.asa this is already done by the associated slave evaluators. Figure 6 shows how the distributed evaluator is activated using the interactive console window. The lower window contains the distributed analysis interactive console. The security ocer has just invoked the run command with the name of the evaluator description le as argument. The upper window is the Unix console where outputs from the master evaluator are printed. 6 Overview of the Implementation The implementation of the rule-based language RUSSEL is out of the scope of this paper and is fully explained in [10, 11]. We only consider the implementation of the distributed aspects. However, it is worth noticing that very few modi cations were necessary to handle record streams instead of ordinary audit trails. In addition to the auditd process, the following concurrent processes are attached to each monitored host (see Figure 7): 6.1 Distributed format adaptor (FA) The distributed format adaptor fadapter translates SunOS audit les into NADF format. It also observes date and time based naming conventions for NADF les: a NADF le consisting of the chronological sequence of audit records R0, ..., Rn?1 is named time0.timen.NADF where time0 is the time and date found in R0 and timen is the time stamp in Rn?1 plus one second. Both time0 and timen are 4 decimal digits year, and 2 decimal digits for each of the month, day, minute, and second. The current NADF le has a name of the form time0.not terminated.NADF where time0 is the time stamp of its rst record. The current native and NADF les are limited to a maximum size which is recorded in the le nadf data. The process sizer sends a signal to auditd (resp. fadapter) if the maximum size for the current native Figure 6: Console windows (resp. NADF) le is reached. When auditd or fadapter receives such a signal, it closes the current le and continues on a new one. The maximum size can be changed at any time by a simple RPC (Remote Procedure Call) server d size svc after request from the console process. d size svc updates the le nadf data accordingly. The distributed FA is automatically started at boot time of each monitored host from /etc/rc.local. 6.2 Logging control Changing the granularity level for a user or a host is performed remotely from the security ocer console by a remote update of the auditd con guration of the involved host. Therefore, logging control is implemented by means of RPC. For this purpose, to each monitored host is attached a server process logcntl svc accepting requests from the console process running on the master machine. Depending on the option used for the command logcntl, the console process calls an appropriate procedure o ered by the logcntl svc server on the involved host. According to the RPC model, logcntl svc transfers control to the appropriate service procedure and then sends back a reply to the console process indicating the outcome of the call. It was not possible to implement such a communication using PVM since all processes participating in the Parallel Virtual Machine must belong to the same user while the logcntl svc server requires root privileges to access the shadow password le. Moreover, the security ocer should not necessarily own root privileges. 6.3 Supplier process 6.4 Evaluator process This process runs on each monitored host. It sends to its evaluator a record stream corresponding to a given time interval. It receives from the console process on the master machine the instance number of its associated evaluator and a time interval. It retrieves corresponding records from the NADF les and sends them in sequence using a PVM message for each record. It is interesting to note that slave and master evaluators are implemented exactly by the same code. This is possible at the cost of providing the additional supplier process which hides the details of how audit records are retrieved. For slave evaluators, the records are received from the supplier process while a master evaluator receives them from its slave evaluators. The evaluator process (on master and slave machines) is the heart of the distributed system. It analyzes record streams according to a rule module. If the evaluator is a master evaluator, the record stream originates from a set of slave evaluators and the result of the analysis may be reports, alarms, statistics, etc. If the evaluator is a slave evaluator, there is only one sending process (the supplier process) and in this case, the result is a ltered sequence of audit records which are sent to the master evaluator. The console process can change the rule module used by an evaluator by sending to it the name of the new module global v: integer; rule failed login(max times, duration: integer); begin trigger o for next count rule(au text 3, strToInt(au time)+duration, max times-1) end; rule count rule(target uid: string; expiration, count down: integer); if au text 3 = target uid and strToInt(au time) < expiration --> if count down > 1 --> trigger o for next count rule(target uid, expiration, count down-1); count down = 1 --> v := v + 1; println(gettime(au time), ': 3 FAILED LOGINS ON ', target uid); begin trigger o for next failed login(3,120) end ; strToInt(au time) > expiration --> failed login(3,120); trigger o for next true --> trigger o for next count rule(target uid, expiration, count down) ; rule print results; begin println(v, ' sequence(s) of bad logins found') end; init action; begin v := 0; trigger o for next failed login(3, 120); trigger o at completion print results end. Figure 5: Master evaluator module: nbbadlogin.asa nadf data   ? d size svc     3   ..  .   ? ? ..   ? audit control  NATIVE .. sizer  ] J . FILE  .. J logcntl svc .   ?  ... passwd.adjunct .  ? /   fadapter    supplier    ?  ? ? ? ?    ? NADF NADF ?  ?  FILE   FILE   evaluator  -- : PVM communication . . . .-. :: Read/write Unix signal  auditd   o.. ... Figure 7: Local Architecture to be applied. At reception, the evaluator executes the completion rules, compiles the new module, executes the resulting init-actions and then waits for audit records. The time interval can also be changed for all evaluators participating in a distributed analysis. For this purpose, the console process sends the new time interval to all involved supplier processes and noti es evaluators for such a change. Upon reception, supplier processes send to the evaluators a record stream determined by the new time interval. Evaluator processes only execute completion rules and init-actions. Completion rules report the results of the previous analysis before changing the current rule module or the time interval. 6.5 Console process This process was already partially described in previous sections. Only a single instance of the console process exists and is active on the master machine under control of the security ocer through the command interface described in section 5. It maintains the status of all active distributed evaluators and coordinates all processes of the distributed system. Under interactive control of the security ocer, the console process can also invoke the remote logcntl svc RPC server to change the current granularity level on a given host. To activate a distributed analysis as indicated in a distributed evaluator description le, the console process initiates an evaluator-supplier pair on each slave host and a master evaluator on the master host. It then sends the time interval to all supplier instances and the appropriate rule module to each evaluator instance. When all suppliers are positioned in the time interval and evaluators have successfully compiled their modules, the console process starts the analysis by triggering record stream transmissions from suppliers to slave evaluators. 7 Performance Measurements 7.1 type Introduction This section reports some performance tests of our system. These measurements aim at showing the feasibility and e ectiveness of the distributed system in terms of response time and network load. It will also follow from these measurements that on-line monitoring is feasible. The experiments were carried out on two SUN SPARCstation 1 running the C2 security level of the SunOS 4.1 and connected to a 10 Mbytes/sec Ethernet. Each machine has 16 Mbytes of random access memory. In addition, a third machine on the Ethernet is used as a le server where NADF les generated at each host are stored using NFS (Network File System). The rst experiment measures the overhead due to the distributed architecture wrt the same analysis performed on a single audit trail. The second one compares the performance of a distributed audit trail analysis and of a centralized audit trail analysis. The last experiment shows the bene ts of executing several analyses in parallel. 7.2 system response time involves optimization of the network communication. Overhead of the distributed architecture In order to measure the overhead introduced by the distributed architecture, we analyzed a single audit le of 500 Kbytes using the single audit trail analysis version on the one hand and the distributed version on the other hand. The analyzed le represents a two days usage of the system by two users. Audited events are le operations as well as normal administrative operations such as the su and login commands. In the rst case, audit records are simply retrieved from the audit le using input/output routines. The second case corresponds to a degenerated distributed evaluator composed of a single slave evaluator. The overhead introduced is mainly due to network communication (using PVM) between the slave and the master. [17] describes experiments comparing the communication times for a number of di erent network programming environments on isolated two and four nodes networks. Since messages exchanged in the distributed system are around 300 bytes in size, it follows from the measurements conducted in [17] that the average data transfer rate is around 0.049Mbytes/sec. The slave evaluator applies the badlogin.asa module as explained earlier and the master evaluator runs the nbbadlogin.asa module. Table 2 gives the mean values of the CPU and elapsed times (in seconds) for the stand alone analysis (SAA) and the distributed analysis (DA). The results suggest that the distributed audit trail analysis is feasible since the elapsed time for the analysis is negligible wrt the time spent in generating the audit data (2 days). However, the overhead due to the distributed architecture is signi cant: most of the elapsed time is spent in communication between nodes. Consequently, improvements of the distributed usr sys total elapsed : : : : : : SAA 1 13 0 68 1 81 DA 3 43 3 73 7 20 53 55 7 : : Table 2: Stand Alone v.s Distributed Analysis 7.3 Centralized v.s distributed audit trail analysis This section reports the performance bene ts of a distributed network security monitoring over a centralized network security monitoring. In the latter approach, monitored nodes do not perform any intelligent3 ltering of audit data. All audit records generated at one node are sent to a central host where the analysis takes place. As shown in Table 3, the distributed analysis has the advantage of drastically reducing the network trac in comparison with the centralized analysis (CA). It also achieves a balancing of the CPU time over several machines. The CPU time of the master evaluator is smaller since part of the analysis is carried out by slave evaluators on slave machines. A system using a centralized architecture for network audit trail analysis is presented in [18]. type CA DA a In usr sys total elapsed traca 11.90 13.60 25.56 265.78 1.15 7.46 8.61 188.56 Kbytes 2,661 39 Table 3: Distributed v.s Centralized Analysis 7.4 Parallel v.s sequential analysis The RUSSEL language allows to execute more than one analysis at the same time i.e., during a single analysis of a given audit le, several independent rule modules can be executed. For instance, we can search in parallel for repeated failed logins as well as for repeated attempts to corrupt system les. We used 4 distributed evaluators described by their distributed evaluator description les. All analyses are limited to a speci ed time interval as shown in Figure 8. The purpose of the rst one is to detect 3 repeated failures to break a given account using the su command. Each of the hosts poireau and epinard runs a slave evaluator which detect unsuccessful su commands. The master evaluator detects sequences of 3 failed su commands invoked at any of the two monitored hosts. The purpose of the second distributed analysis is to detect attempts to corrupt system les on either of 3 Assuming that a simple pre-selection of auditable events cannot be considered as an intelligent ltering. bad su commands: master poireau: nbbadsu:[19940531170524,19940606173854]; slaves poireau, epinard: badsu. System corruption: master poireau: fscorrupt:[19940531170524,19940606173854]; slaves poireau, epinard: corrupt. Set user id files: master poireau: setuid:[19940531170524,19940606173854]; slaves poireau, epinard: create. trojan su: master module.asa: nbbadsu, fscorrupt, setuid, trojan. uses slave module.asa: badsu, corrupt, create, exec. uses Figure 9: Parallel analysis: RUSSEL modules for the master and slave type nbbadsu fscorrupt setuid trojan total parallel usr 02 43 02 33 03 10 02 83 10 69 08 03 : : : : : : sys total elapsed 10 18 12 51 11 98 11 83 46 50 15 03 : : : : : : 12 61 14 48 15 08 14 66 57 19 23 06 : : : : : : 159 48 176 90 182 46 184 09 702 92 209 83 : : : : : : master poireau: trojan:[19940531170524,19940606173854]; slaves poireau, epinard: exec. Table 4: Multiple Distributed Analysis v.s Parallel Analysis Figure 8: The distributed evaluator description le for the parallel execution is depicted in Figure 10. the two hosts. System les corruption could be deletion, creation, attribute modi cation of any of system les or directories. Each slave evaluator applies the RUSSEL module corrupt.asa that detects deletion, creation or attribute modi cation of les. The master evaluators uses the module fscorrupt.asa to check that such operations involve a system le. The third analysis aims at detecting new set user id les. For this purpose, slave evaluators on epinard and poireau detect creation of les on hidden directories such as /tmp or /usr/tmp and modi cation of their access ags. At the master evaluator, the module setuid.asa is used to detect the creation of a le on hidden directory followed by a modi cation of access ags of these same le such that the created le is a set user id le. The last analysis searches for trojan system programs such as su. The slaves detect the execution of any command using the module exec.asa while the master applies the module trojan.asa to check if such an execution involves a trojan program. The multiple distributed analysis amounts to execute these distributed analyses one after the other using the run command with the appropriate distributed evaluator description le as argument. The corresponding execution times are reported in Table 4. In the case of parallel execution, we activate a single distributed evaluator which performs the four analyses at the same time. For this purpose, the master evaluator uses a rule module (master module.asa) which includes all modules applied by each of the above 4 masters (see Figure 9). Similarly, slave evaluators (on epinard and poireau) run a single module (slave module.asa) which includes the 4 ones applied by the previous slave evaluators. parallel distributed evaluator: poireau: master module:[19940531170524,19940606173854]; poireau, epinard: slave module. Multiple Distributed Analyses: RUSSEL modules for the master and the slaves master slaves Figure 10: Parallel Analysis Description File The execution times for the parallel analysis are found in the last line of table 4. It follows from this table that the performance gain is substantial. Note that the elapsed time of the parallel analysis is not signi cantly di erent from the elapsed time of a single analysis. This suggests that complex on-line analyses (combining many single analyses in parallel) are feasible. 8 Conclusions and Future Works This paper presented an implemented system for on-line analysis of multiple distributed data streams. Universality of the system makes it conceptually independent from any architecture or operating system. This is achieved by means of format adaptors which translate data streams to a canonical format. The rule-based language (RUSSEL) is speci cally designed for analyzing unstructured data streams. This makes the presented system (theoretically) as powerful as possible and still ecient enough for solving complex queries on the data streams. We also presented the distributed architecture of the system and its implementation. E ectiveness of the distributed system was demonstrated by reporting performance measurements conducted on real network attack examples. These measurements also showed that on-line distributed analysis is feasible even for complex problems. Further works will tackle the problem of reducing the overhead due to network communication. For the present version of the system, audit records are transmitted using one PVM message by record. A rst improvement is to bu er audit records before packing them in a single PVM message. Another improvement involves a direct use of standard communication protocols such as TCP/IP instead of PVM. More standard protocols will increase the portability and the robustness of our system. References [1] A. Baur, W. Weiss, Audit Analysis Tool for Systems with High Demands Regarding Security and Access Control. Research Report, ZFE F2 SOF 42, Siemens Nixdorf Software, Munich, November 1988. [2] W.R. Cheswick, S.M. Bellovin, Firewalls and internet security: repelling the wily hacker. Addison-Wesley 1994, 306 pages. ISBN 0-201-63357-4. [3] D.E. Denning, An Intrusion-Detection Model. IEEE Transactions on Software Engineering, Vol.13 No.2, February 1987. [4] Th. D. Garvey, T.F. Lunt, Model-Based Intrusion Detection. Proceedings of the 14th National Security Conference, Washington DC., October 1991. [5] T. Lunt, J. van Horne, L. Halme, Automated Analysis of Computer System Audit Trails. Proceedings of the 9th DOE Computer Security Group Conference, May 1986. [6] T. F. Lunt, R. Jagannathan, A Prototype Real-time Intrusion Detection Expert System. Proceedings of the 1988 IEEE Symposium on Security and Privacy, April 1988. [9] T. F. Lunt et. al., A Real-Time Intrusion Detection Expert System. Interim Progress Report, Computer Science Laboratory, SRI International, Menlo Park, CA, May 1990. [10] N.Habra, B. Le Charlier, A. Mounji, Preliminary report on Advanced Security Audit Trail Analysis on Unix 15.12.91, 34 pages. [11] N.Habra, B. Le Charlier, A. Mounji, Advanced Security Audit Trail Analysis on Unix. Implementation design of the NADF Evaluator Mar 93, 62 pages. [12] N.Habra, B. Le Charlier, I. Mathieu, A. Mounji, ASAX: Software Architecture and Rule-based Language for Universal Audit Trail Analysis. Proceedings of the Second European Symposium on Research in Computer Security (ESORICS). Toulouse, France, November 1992. [13] A. Mounji, B. Le Charlier, D. Zampunieris, N.Habra, Preliminary report on Advanced Security Audit Trail Analysis on Unix 15.12.91, 34 pages. [14] Marshall T. Rose, The Open Book: a Practical Perspective on OSI. Prentice-Hall 1990, 651 pages. ISBN 0-13-643016-3. [15] A. Beguelin, J. Dongarra, A. Geist, R. Manchek, V. Sunderam, A User Guide to PVM (Parallel Virtual Machine). ORNL/TM-11826. July, 1991, 13 pages. [16] Sun Microsystems, Network Programming Guide, Part Number 800-3850-10 Revision A of 27 March, 1990. [7] T. F. Lunt, Automated Audit Trail Analysis and Intrusion Detection: A Survey. Proceedings of the 11th National Security Conference, Baltimore, MD, October 1988. [17] Craig C. Douglas, Timothy G. Mattson, Martin H. Shultz, Parallel Programming Systems For Workstation Clusters. Yale University Department of Computer Science Research Report YALEU/DCS/TR975, August 1993, 36 pages. [8] T. F. Lunt, Real Time Intrusion Detection. Proceedings of the COMPCON spring 89', San Francisco, CA, February 1989. [18] J.R. Winkler, A Unix Prototype for Intrusion and Anomaly Detection in Secure Networks. Planning Research Corporation, R&D, 1990.