Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Reliable detection of episodes in event sequences

Published: 01 May 2005 Publication History

Abstract

Suppose one wants to detect bad or suspicious subsequences in event sequences. Whether an observed pattern of activity (in the form of a particular subsequence) is significant and should be a cause for alarm depends on how likely it is to occur fortuitously. A long-enough sequence of observed events will almost certainly contain any subsequence, and setting thresholds for alarm is an important issue in a monitoring system that seeks to avoid false alarms. Suppose a long sequence, T, of observed events contains a suspicious subsequence pattern, S, within it, where the suspicious subsequence S consists of m events and spans a window of size w within T. We address the fundamental problem: Is a certain number of occurrences of a particular subsequence unlikely to be generated by randomness itself (i.e. indicative of suspicious activity)? If the probability of an occurrence generated by randomness is high and an automated monitoring system flags it as suspicious anyway, then such a system will suffer from generating too many false alarms. This paper quantifies the probability of such an S occurring in T within a window of size w, the number of distinct windows containing S as a subsequence, the expected number of such occurrences, its variance, and establishes its limiting distribution that allows setting up an alarm threshold so that the probability of false alarms is very small. We report on experiments confirming the theory and showing that we can detect bad subsequences with low false alarm rate.

References

[1]
Aho A, Corasick M (1975) Efficient string matching: An aid to biblographic search. Programming techniques
[2]
Apostolico A, Atallah M (2002) Compact recognizers of episode sequences. Inform Comput 174:180---192
[3]
Billingsley P (1986) Probability and measure. Wiley, New York
[4]
Boasson L, Cegielski P, Guessarian I, Matiyasevich Y (1999) Window-accumulated subsequence matching problem is linear. Proc PODS pp 327---336
[5]
Crochemore M, Rytter W (1994) Text algorithms. Oxford University Press, New York
[6]
Das G, Fleischer R, Gasieniec L, Gunopulos D, Kärkkäinen J (1997) Episode matching. In: Combinatorial pattern matching, 8th annual symposium. Lecture Notes in Computer Science 1264, pp 12---27
[7]
Flajolet P, Guivarc'h Y, Szpankowski W, Vallée B (2001) Hidden pattern statistics. ICALP 2001, Crete, Greece, LNCS 2076, pp 152---165
[8]
Kucherov G, Rusinowitch M (1997) Matching a set of strings with variable length don't cares. Theor Comput Sci 178:129---154
[9]
Kumar S, Spafford EH (1994) A pattern-matching model for intrusion detection. Proceedings of the National Computer Security Conference, pp 11---21
[10]
Mannila H, Toivonen H, Verkamo A (1997) Discovery of frequent episodes in event sequences. Data Min Knowl Discov 1:241---258
[11]
Nicodème P, Salvy B, Flajolet P (1999) Motif statistics. European symposium on algorithms. Lecture Notes in Computer Science 1643, pp 194---211
[12]
Pevzner P (2000) Computational molecular biology: an algorithmic approach. MIT Press
[13]
Régnier M, Szpankowski W (1998) On pattern frequency occurrences in a Markovian sequence. Algorithmica 22:631---649
[14]
Rigoutsos I, Floratos A, Parida L, Gao Y, Platt D (2000) The emergence of pattern discovery techniques in computational biology. Metabol Eng 2:159---177
[15]
Sedgewick R, Flajolet P (1995) An introduction to the analysis of algorithms. Addison-Wesley, Reading, MA
[16]
Szpankowski W (2001) Average case analysis of algorithms on sequence. Wiley, New York
[17]
Waterman M (1995) Introduction to computational biology. Chapman and Hall, London
[18]
Wespi A, Debar H, Dacier M, Nassehi M (2000) Fixed vs variable-length patterns for detecting suspicious process behavior. J Comput Secur 8:159---181
[19]
Wu S, Manber U (1995) Fast text searching allowing errors. Comm ACM 35:83---91

Cited By

View all
  • (2023)Bilateral‐Weighted Online Adaptive Isolation Forest for anomaly detection in streaming dataStatistical Analysis and Data Mining10.1002/sam.1161216:3(215-223)Online publication date: 17-May-2023
  • (2022)NLP Based Anomaly Detection for Categorical Time Series2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00019(27-34)Online publication date: 9-Aug-2022
  • (2021)STAD: Spatio-Temporal Anomaly Detection Mechanism for Mobile Network ManagementIEEE Transactions on Network and Service Management10.1109/TNSM.2020.304813118:1(894-906)Online publication date: 1-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Knowledge and Information Systems
Knowledge and Information Systems  Volume 7, Issue 4
May 2005
128 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 May 2005

Author Tags

  1. Data mining
  2. Episode pattern matching
  3. Hidden pattern matching
  4. Overrepresented and Underrepresented patterns
  5. Probabilistic analysis

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Bilateral‐Weighted Online Adaptive Isolation Forest for anomaly detection in streaming dataStatistical Analysis and Data Mining10.1002/sam.1161216:3(215-223)Online publication date: 17-May-2023
  • (2022)NLP Based Anomaly Detection for Categorical Time Series2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)10.1109/IRI54793.2022.00019(27-34)Online publication date: 9-Aug-2022
  • (2021)STAD: Spatio-Temporal Anomaly Detection Mechanism for Mobile Network ManagementIEEE Transactions on Network and Service Management10.1109/TNSM.2020.304813118:1(894-906)Online publication date: 1-Mar-2021
  • (2020)Filtering Infrequent Behavior in Business Process Discovery by Using the Minimum ExpectationInternational Journal of Cognitive Informatics and Natural Intelligence10.4018/IJCINI.202004010114:2(1-15)Online publication date: 1-Apr-2020
  • (2020)Outlier DetectionACM Computing Surveys10.1145/338102853:3(1-37)Online publication date: 12-Jun-2020
  • (2017)Filtering Out Infrequent Behavior from Business Process Event LogsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2016.261468029:2(300-314)Online publication date: 1-Feb-2017
  • (2017)A novel methodology for stock investment using high utility episode mining and genetic algorithmApplied Soft Computing10.1016/j.asoc.2017.05.03259:C(303-315)Online publication date: 1-Oct-2017
  • (2016)Mixture of hyperspheres for novelty detectionVietnam Journal of Computer Science10.1007/s40595-016-0069-x3:4(223-233)Online publication date: 1-Nov-2016
  • (2016)SkopusData Mining and Knowledge Discovery10.1007/s10618-016-0467-930:5(1086-1111)Online publication date: 1-Sep-2016
  • (2015)Discovering utility-based episode rules in complex event sequencesExpert Systems with Applications: An International Journal10.1016/j.eswa.2015.02.02242:12(5303-5314)Online publication date: 15-Jul-2015
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media