Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1081870.1081953acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Optimizing time series discretization for knowledge discovery

Published: 21 August 2005 Publication History

Abstract

Knowledge Discovery in time series usually requires symbolic time series. Many discretization methods that convert numeric time series to symbolic time series ignore the temporal order of values. This often leads to symbols that do not correspond to states of the process generating the time series and cannot be interpreted meaningfully. We propose a new method for meaningful unsupervised discretization of numeric time series called Persist. The algorithm is based on the Kullback-Leibler divergence between the marginal and the self-transition probability distributions of the discretization symbols. Its performance is evaluated on both artificial and real life data in comparison to the most common discretization methods. Persist achieves significantly higher accuracy than existing static methods and is robust against noise. It also outperforms Hidden Markov Models for all but very simple cases.

References

[1]
J. Bilmes. A Gentle Tutorial on the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models. Technical Report ICSI-TR-97-021, University of Berkeley, 1997.
[2]
C. Daw, C. Finney, and E. Tracy. A review of symbolic analysis of experimental data. Review of Scientific Instruments, 74:916--930, 2003.
[3]
J. Dougherty, R. Kohavi, and M. Sahami. Supervised and unsupervised discretization of continuous features. In Int. Conf. on Machine Learning, pages 194--202, 1995.
[4]
A. Gionis and H. Mannila. Finding recurrent sources in sequences. In Proc. 7th Int. Conf. on Computational Molecular Biology, pages 123--130, 2003.
[5]
S. K. Harms and J. Deogun. Sequential association rule mining with time lags. Journal of Intelligent Information Systems (JIIS), 2003.
[6]
M. L. Hetland and P. Saetrom. Temporal rule discovery using genetic programming and specialized hardware. In A. Lotfi, J. Garibaldi, and R. John, editors, Proc. 4th Int. Conf. on Recent Advances in Soft Computing, pages 182--188, 2002.
[7]
M. L. Hetland and P. Saetrom. The role of discretization parameters in sequence rule evolution. In Proc. 7th Int. Conf. on Knowledge-Based Intelligent Information & Engineering Systems, KES. Springer, 2003.
[8]
M. W. Kadous. Learning comprehensible descriptions of multivariate time series. In Proc. 16th Int. Conf. on Machine Learning, pages 454--463. Morgan Kaufmann, 1999.
[9]
E. Keogh. UCR Time Series Data Mining Archive, http://www.cs.ucr.edu/~eamonn/tsdma/index.html, 2002.
[10]
E. Keogh, S. Chu, D. Hart, and M. Pazzani. Segmenting time series: A survey and novel approach. In Data Mining in Time Series Databases. World Scientific Publishing Company, 2003.
[11]
E. Keogh and S. Kasetty. On the need for time series data mining benchmarks: A survey and empirical demonstration. In 8th ACM SIGKDD, Edmonton, Canada, pages 102--111, 2002.
[12]
E. Keogh, S. Lonardi, and B. Chiu. Finding surprising patterns in a time series database in linear time and space. In Proc. 8th ACM SIGKDD, Edmonton, Canada, July 2002.
[13]
E. J. Keogh, K. Chakrabarti, M. J. Pazzani, and S. Mehrotra. Dimensionality reduction for fast similarity search in large time series databases. Knowledge and Information Systems, 3(3):263--286, 2001.
[14]
R. Kohavi and M. Sahami. Error-based and entropy-based discretization of continuous features. In Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, pages 114--119, 1996.
[15]
S. Kullback and R. A. Leibler. On information and sufficiency. Annals of Mathematical Statistics, 22:79--86, 1951.
[16]
J. Lin, E. Keogh, S. Lonardi, and B. Chiu. A symbolic representation of time series, with implications for streaming algorithms. In Proc. 8th ACM SIGMOD, DMKD workshop, pages 2--11, 2003.
[17]
H. Liu, F. Hussain, C. L. Tan, and M. Dash. Discretization: An enabling technique. Data Mining and Knowledge Discovery, (6):393--423, 2002.
[18]
J. Mäntyjärvi, J. Himberg, P. Kangas, U. Tuomela, and P. Huuskonen. Sensor signal data set for exploring context recognition of mobile devices. In 2nd Int. Conf. on Pervasive Computing, Linz/Vienna, Austria, 2004.
[19]
F. Mörchen and A. Ultsch. Discovering temporal knowlegde in multivariate time series. In Proc. GfKl 2004, Dortmund, Germany, 2004.
[20]
F. Mörchen, A. Ultsch, and O. Hoos. Extracting interpretable muscle activation patterns with time series knowledge mining. Int. Journal of Knowledge-Based & Intelligent Engineering Systems, 2005.
[21]
L. R. Rabiner. A tutorial on hidden markov models and selected applications in speech recognition. Proc. of IEEE, 77(2):257--286, 1989.
[22]
J. J. Rodriguez, C. J. Alonso, and H. Boström. Learning first order logic time series classifiers. In Proc. 10th Int. Conf. on Inductive Logic Programming, pages 260--275, 2000.
[23]
A. Ultsch. Pareto Density Estimation: Probability Density Estimation for Knowledge Discovery. In Proc. GfKl 2003, Cottbus, Germany, 2003.

Cited By

View all
  • (2023)The Semantic Adjacency Criterion in Time Intervals MiningBig Data and Cognitive Computing10.3390/bdcc70401737:4(173)Online publication date: 9-Nov-2023
  • (2023)A Novel Embedded Discretization-Based Deep Learning Architecture for Multivariate Time Series ClassificationIEEE Transactions on Industrial Informatics10.1109/TII.2022.318883919:4(5976-5984)Online publication date: Apr-2023
  • (2023)Classification of Event Sequences Based on Temporal Relation Features2023 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp57234.2023.00052(18-25)Online publication date: Feb-2023
  • Show More Cited By

Index Terms

  1. Optimizing time series discretization for knowledge discovery

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '05: Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
    August 2005
    844 pages
    ISBN:159593135X
    DOI:10.1145/1081870
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 21 August 2005

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. discretization
    2. persistence
    3. time series

    Qualifiers

    • Article

    Conference

    KDD05

    Acceptance Rates

    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)37
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 01 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)The Semantic Adjacency Criterion in Time Intervals MiningBig Data and Cognitive Computing10.3390/bdcc70401737:4(173)Online publication date: 9-Nov-2023
    • (2023)A Novel Embedded Discretization-Based Deep Learning Architecture for Multivariate Time Series ClassificationIEEE Transactions on Industrial Informatics10.1109/TII.2022.318883919:4(5976-5984)Online publication date: Apr-2023
    • (2023)Classification of Event Sequences Based on Temporal Relation Features2023 IEEE International Conference on Big Data and Smart Computing (BigComp)10.1109/BigComp57234.2023.00052(18-25)Online publication date: Feb-2023
    • (2023)Predicting unplanned readmissions in the intensive care unit: a multimodality evaluationScientific Reports10.1038/s41598-023-42372-y13:1Online publication date: 18-Sep-2023
    • (2023)Predictive temporal patterns discoveryExpert Systems with Applications10.1016/j.eswa.2023.119974226(119974)Online publication date: Sep-2023
    • (2023)TIRPClo: efficient and complete mining of time intervals-related patternsData Mining and Knowledge Discovery10.1007/s10618-023-00944-637:5(1806-1857)Online publication date: 30-Jun-2023
    • (2023)Selected Aspects of Interactive Feature ExtractionTransactions on Rough Sets XXIII10.1007/978-3-662-66544-2_8(121-287)Online publication date: 1-Jan-2023
    • (2023)Overview of Time Series Classification Based on Symbolic Discretization for ECG ApplicationsAdvances in Computational Collective Intelligence10.1007/978-3-031-41774-0_58(740-752)Online publication date: 22-Sep-2023
    • (2022)Development of fully convolutional neural networks based on discretization in time series classificationIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2022.3177724(1-1)Online publication date: 2022
    • (2022)TA4LKnowledge-Based Systems10.1016/j.knosys.2022.108554244:COnline publication date: 23-May-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media