Characterizing the behavior of a program using multiple-length n-grams
C Marceau - Proceedings of the 2000 workshop on New security …, 2001 - dl.acm.org
C Marceau
Proceedings of the 2000 workshop on New security paradigms, 2001•dl.acm.orgSome recent advances in intrusion detection are based on detecting anomalies in program
behavior, as characterized by the sequence of kernel calls the program makes. Specifically,
traces of kernel calls are collected during a training period. The substrings of fixed length N
(for some N) of those traces are called N-grams. The set of N-grams occurring during normal
execution has been found to discriminate effectively between normal behavior of a program
and the behavior of the program under attack. The N-gram characterization, while effective …
behavior, as characterized by the sequence of kernel calls the program makes. Specifically,
traces of kernel calls are collected during a training period. The substrings of fixed length N
(for some N) of those traces are called N-grams. The set of N-grams occurring during normal
execution has been found to discriminate effectively between normal behavior of a program
and the behavior of the program under attack. The N-gram characterization, while effective …
Abstract
Some recent advances in intrusion detection are based on detecting anomalies in program behavior, as characterized by the sequence of kernel calls the program makes. Specifically, traces of kernel calls are collected during a training period. The substrings of fixed length N (for some N) of those traces are called N-grams. The set of N-grams occurring during normal execution has been found to discriminate effectively between normal behavior of a program and the behavior of the program under attack. The N-gram characterization, while effective, requires the user to choose a suitable value for N. This paper presents an alternative characterization, as a finite state machine whose states represent predictive sequences of different lengths. An algorithm is presented to construct the finite state machine from training data, based on traditional string-processing data structures but employing some novel techniques.
ACM Digital Library