Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/775047.775128acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Finding surprising patterns in a time series database in linear time and space

Published: 23 July 2002 Publication History

Abstract

The problem of finding a specified pattern in a time series database (i.e. query by content) has received much attention and is now a relatively mature field. In contrast, the important problem of enumerating all surprising or interesting patterns has received far less attention. This problem requires a meaningful definition of "surprise", and an efficient search technique. All previous attempts at finding surprising patterns in time series use a very limited notion of surprise, and/or do not scale to massive datasets. To overcome these limitations we introduce a novel technique that defines a pattern surprising if the frequency of its occurrence differs substantially from that expected by chance, given some previously seen data.

References

[1]
A. Apostolico, M. E. Bock, and S. Lonardi. Monotony of surprise and large-scale quest for unusual words (extended abstract). In G. Myers, S. Hannenhalli, S. Istrail, P. Pevzner, and M. Waterman, editors, Proc. of Research in Computational Molecular Biology (RECOMB), Washington, DC, April 2002.]]
[2]
A. Apostolico, M. E. Bock, S. Lonardi, and X. Xu. Efficient detection of unusual words. J. Comput. Bio., 7(1/2):71--94, Jan. 2000.]]
[3]
H. Blockeel, J. Furnkranz, A. Prskawetz, and F. C. Billari. Detecting temporal change in event sequences: An application to demographic data. In Proc. Principles of Data Mining and Knowledge Discovery, pages 29--41, 2001.]]
[4]
S. Chakrabarti, S. Sarawagi, and B. Dom. Mining surprising patterns using temporal description length. In Proc. 24th Int. Conf. Very Large Data Bases, Pages 606--617, 1998.]]
[5]
G. Das, K.-I. Lin, H. Mannila, G. Renganathan, and P. Smyth. Rule discovery from time series. In Proc. the 4th International Conference of Knowledge Discovery and Data Mining, pages 16--22. AAAI Press, 1998.]]
[6]
D. Dasgupta and S. Forrest. Novelty detection in time series data using ideas from immunology. In Proc. of The International Conference on Intelligent Systems, 1999.]]
[7]
C. S. Daw, C. E. A. Finney, and E. R. Tracy. Symbolic analysis of experimental data. Review of Scientific Instruments 2001, Oct. 30--31 2001.]]
[8]
C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast subsequence matching in time-series databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 23(2):419--429, June 1994.]]
[9]
M. Farach. Optimal suffix tree construction with large alphabets. In Proc. 38th Annual Symposium on Foundations of Computer Science, pages 137--143, Oct. 1997.]]
[10]
T. Fawcett and F. Provost. Activity monitoring: Noticing interesting changes in behavior. In Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 53--62, 1999.]]
[11]
W. Feller. An introduction to Probability Theory and its Applications. Wiley, New York, 1968.]]
[12]
X. Ge and P. Smyth. Deformable markov model templates for time-series pattern matching. In Proceedinmgs of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 81--90, 2000.]]
[13]
D. Gusfield. Algorithms on Strings, Trees, and Sequences: Computer Science and Computational Biology. Cambridge University Press, 1997.]]
[14]
D. M. Hawkins. Identification of Outliers, Monographs on Applied Probability & Statistics. Chapman and Hall, London, 1980.]]
[15]
Y.-W. Huang and P. Yu. Adaptive query processing for time-series data. In S. Chaudhuri and D. Madigan, editors, Proc. Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 282--286. ACM Press, Aug. 15--18 1999.]]
[16]
H. V. Jagadish, N. Koudas, and S. Muthukrishnan. Mining deviants in a time series database. In Proc. 25th International Conference on Very Large Data Bases, pages 102--113, 1999.]]
[17]
E. Keogh, K. Chakrabarti, M. Pazzani, and S. Mehrotra. Locally adaptive dimensionality reduction for indexing large time series databases. SIGMOD Record (ACM Special Interest Group on Management of Data), 30(2):151--162, June 2001.]]
[18]
E. Keogh and M. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback. In Proc. 4th International Conference on Knowledge Discovery and Data Mining, pages 239--241, 1998.]]
[19]
E. Kotsakis and A. Wolski. Maps: A method for identifying and predicting aberrant behaviour in time series. In Proc. 14th Internat. Conf. on Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 2001.]]
[20]
S. Lonardi. Global Detectors of Unusual Words: Design, Implementation, and Applications to Pattern Discovery in Biosequences. PhD thesis, Department of Computer Sciences, Purdue University, August 2001.]]
[21]
E. M. McCreight. A space-economical suffix tree construction algorithm. J. Assoc. Comput. Mach., 23(2):262--272, Apr. 1976.]]
[22]
S. Park, W. W. Chu, J. Yoon, and C. Hsu. Efficient searches for similar subsequences of different lengths in sequence databases. In Proc. International Conference on Data Engineering, pages 23--32, 2000.]]
[23]
G. Reinert, S. Schbath, and M. S. Waterman. Probabilistic and statistical properties of words: An overview. J. Comput. Bio., 7:1--46, 2000.]]
[24]
C. Shahabi, X. Tian, and W. Zhao. Tsa-tree: A wavelet-based approach to improve the efficiency of multi-level surprise and trend queries. In Proc. l2th International Conference on Scientific and Statistical Database Management, 2000.]]
[25]
E. Ukkonen. On-line construction of suffix trees. Algorithmica, 14(3):249--260, 1995.]]
[26]
J. J. van Wijk and E. R. van Selow. Cluster and calendar-based visualization of time series data. In Proc. IEEE Symposium on Information Visualization, pages 4--9, Oct. 25--26 1999.]]
[27]
P. Weiner. Linear pattern matching algorithm. In Proc. 14th Annual IEEE Symposium on Switching and Automata Theory, pages 1--11, Washington, DC, 1973.]]
[28]
B. Whitehead and W. A. Hoyt. A function approximation approach to anomaly detection in propulsion system test data. In Proc. AIAA/SAE/ASME/ASEE 29th Joint Propulsion Conference, Monterey, CA, June 1993.]]
[29]
T. Yairi, Y. Kato, and K. Hori. Fault detection by mining association rules from house-keeping data. In Proc. of International Symposium on Artificial Intelligence, Robotics and Automation in Space, 2001.]]

Cited By

View all
  • (2024)Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A SurveyElectronics10.3390/electronics1316333913:16(3339)Online publication date: 22-Aug-2024
  • (2024)Temporal Assessment of Malicious Behaviors: Application to Turnout Field Data Monitoring2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553981(1-6)Online publication date: 15-May-2024
  • (2023)Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense StrategiesSensors10.3390/s2321884023:21(8840)Online publication date: 30-Oct-2023
  • Show More Cited By

Index Terms

  1. Finding surprising patterns in a time series database in linear time and space

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '02: Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining
    July 2002
    719 pages
    ISBN:158113567X
    DOI:10.1145/775047
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 23 July 2002

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Markov Model
    2. anomaly detection
    3. feature extraction
    4. novelty detection
    5. suffix tree
    6. time series

    Qualifiers

    • Article

    Conference

    KDD02
    Sponsor:

    Acceptance Rates

    KDD '02 Paper Acceptance Rate 44 of 307 submissions, 14%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)85
    • Downloads (Last 6 weeks)5
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Outlier Detection in Streaming Data for Telecommunications and Industrial Applications: A SurveyElectronics10.3390/electronics1316333913:16(3339)Online publication date: 22-Aug-2024
    • (2024)Temporal Assessment of Malicious Behaviors: Application to Turnout Field Data Monitoring2024 International Conference on Control, Automation and Diagnosis (ICCAD)10.1109/ICCAD60883.2024.10553981(1-6)Online publication date: 15-May-2024
    • (2023)Securing Industrial Control Systems: Components, Cyber Threats, and Machine Learning-Driven Defense StrategiesSensors10.3390/s2321884023:21(8840)Online publication date: 30-Oct-2023
    • (2023)Integration of Fuzzy Ontologies and Neural Networks in the Detection of Time Series AnomaliesMathematics10.3390/math1105120411:5(1204)Online publication date: 1-Mar-2023
    • (2023)Multi-Modal Financial Time-Series Retrieval Through Latent Space ProjectionsProceedings of the Fourth ACM International Conference on AI in Finance10.1145/3604237.3626901(498-506)Online publication date: 27-Nov-2023
    • (2023)An Efficient Aggregation Method for the Symbolic Representation of Temporal DataACM Transactions on Knowledge Discovery from Data10.1145/353262217:1(1-22)Online publication date: 20-Feb-2023
    • (2023)Time Series Database Optimization Based on InfluxDB2023 International Conference on Power, Electrical Engineering, Electronics and Control (PEEEC)10.1109/PEEEC60561.2023.00172(879-885)Online publication date: 25-Sep-2023
    • (2023)High Performance Time Series Anomaly Detection Using Brain Inspired Cortical Coding MethodIEEE Access10.1109/ACCESS.2023.323921211(8345-8361)Online publication date: 2023
    • (2023)caSPiTa: mining statistically significant paths in time series data from an unknown networkKnowledge and Information Systems10.1007/s10115-022-01800-765:6(2347-2374)Online publication date: 2-Feb-2023
    • (2022)A Pattern Dictionary Method for Anomaly DetectionEntropy10.3390/e2408109524:8(1095)Online publication date: 9-Aug-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media