Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2339530.2339587acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
research-article

Detecting changes of clustering structures using normalized maximum likelihood coding

Published: 12 August 2012 Publication History
  • Get Citation Alerts
  • Abstract

    We are concerned with the issue of detecting changes of clustering structures from multivariate time series. From the viewpoint of the minimum description length(MDL) principle, we propose an algorithm that tracks changes of clustering structures so that the sum of the code-length for data and that for clustering changes is minimum. Here we employ a Gaussian mixture model(GMM) as representation of clustering, and compute the code-length for data sequences using the normalized maximum likelihood (NML) coding. The proposed algorithm enables us to deal with clustering dynamics including merging, splitting, emergence, disappearance of clusters from a unifying view of the MDL principle. We empirically demonstrate using artificial data sets that our proposed method is able to detect cluster changes significantly more accurately than an existing statistical-test based method and AIC/BIC-based methods. We further use real customers' transaction data sets to demonstrate the validity of our algorithm in market analysis. We show that it is able to detect changes of customer groups, which correspond to changes of real market environments.

    Supplementary Material

    JPG File (311b_m_talk_7.jpg)
    MP4 File (311b_m_talk_7.mp4)

    References

    [1]
    H. Akaike. A new look at the statistical model identification. IEEE Trans. on Automatic Control, 19(6):716--723, Dec. 1974.
    [2]
    D.Chakrabrti, R.Kumar. Evolutionary clustering. Proc.KDD06, pp:554--560. 2006.
    [3]
    A. P. Dempster, N. M. Laird, and D. B. Rubin. Maximum likelihood from incomplete data via the em. J. Royal Staitst. Soc. B, 39:1--38, 1977.
    [4]
    J. Hershey and P. Olsem. Approximating the kullback Leibler divergence between Gaussian mixture models. Proc. of ICASSP, 4:317--320, 2007.
    [5]
    S. Hirai and K. Yamanishi. Efficient computation of normalized maximum likelihood coding for Gaussian mixtures with its applications to optimal clustering. Proc. of IEEE ISIT, pp.1031--1035, 2011.
    [6]
    S. Hirai and K. Yamanishi. Normalized maximum likelihood coding for exponential family with its applications to optimal clustering. arXiv:1205.3549, 2012.
    [7]
    P. Kontkanen and P. Myllymaki. A linear time algorithm for computing the multinomial stochastic complexity. Inf. Proc. Letters, 103:227--233, 2007.
    [8]
    Z. G. Krempl and M.Spiliopoulou. Online clustering of high-dimensional trajectories under concept drift. Proc. ECML-PKDD2011, Part II, pp. 261--276, 2011.
    [9]
    R. E. Krichevsky and V. K. Trofimov. The performance of universal encoding. IEEE Trans. Inf. Theory, 27:199--207, 1981.
    [10]
    J. Rissanen. Stochastic Complexity in Statistical Inquiries. World Scientific, 1989.
    [11]
    M. Sato. Online model selection based on the variational bayes. NC, 13:1649--1681, 2001.
    [12]
    G. Schwarz. Estimating the dimension of a model. Annals of Statistics 6 (2), pp. 461--464, 1978.
    [13]
    Y. M. Shtarkov. Universal sequential coding of single messages. Problems of Information Transmission, 23(3):3--17, 1987.
    [14]
    M. Song and H. Wang. Highly efficient incremental estimation of Gaussian mixture models for online data stream clustering. Intelligent Computing: Theory and Application, 2005.
    [15]
    J. Sun, S. Papadimitriou, P. S. Yu, and C. Faloutsos. Graphscope: Parameter-free mining of large time evolving graphs. Proc. KDD07, pp: 687--696, 2007.
    [16]
    K. Yamanishi and Y. Maruyama. Dynamic syslog mining for network failure monitoring. Proc. of KDD2005, 499--508, 2005.
    [17]
    K. Yamanishi and Y. Maruyama. Dynamic model selection with its applications to novelty detection. IEEE Trans. on Inf. Theory, 53(6):2180--2189, 2007.

    Cited By

    View all

    Index Terms

    1. Detecting changes of clustering structures using normalized maximum likelihood coding

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      KDD '12: Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining
      August 2012
      1616 pages
      ISBN:9781450314626
      DOI:10.1145/2339530
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 August 2012

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. clustering
      2. dynamic model selection
      3. minimum description length principle
      4. normalized maximum likelihood

      Qualifiers

      • Research-article

      Conference

      KDD '12
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

      Upcoming Conference

      KDD '24

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)12
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 06 Aug 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Detecting signs of model change with continuous model selection based on descriptive dimensionalityApplied Intelligence10.1007/s10489-023-04780-553:22(26454-26471)Online publication date: 24-Aug-2023
      • (2023)Continuous Model SelectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_7(265-285)Online publication date: 15-Sep-2023
      • (2023)MDL Change DetectionLearning with the Minimum Description Length Principle10.1007/978-981-99-1790-7_6(209-263)Online publication date: 15-Sep-2023
      • (2022)Mixture Complexity and Its Application to Gradual Clustering Change DetectionEntropy10.3390/e2410140724:10(1407)Online publication date: 1-Oct-2022
      • (2022)A Pattern Dictionary Method for Anomaly DetectionEntropy10.3390/e2408109524:8(1095)Online publication date: 9-Aug-2022
      • (2022)A Short Review on Minimum Description Length: An Application to Dimension Reduction in PCAEntropy10.3390/e2402026924:2(269)Online publication date: 13-Feb-2022
      • (2021)Detecting Gradual Structure Changes of Non-parametric Distributions via Kernel Complexity2021 IEEE International Conference on Big Data (Big Data)10.1109/BigData52589.2021.9671957(17-27)Online publication date: 15-Dec-2021
      • (2019)Data Discovery and Anomaly Detection Using Atypicality for Real-Valued DataEntropy10.3390/e2103021921:3(219)Online publication date: 26-Feb-2019
      • (2019)Modern MDL meets Data Mining Insights, Theory, and PracticeProceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3292500.3332284(3229-3230)Online publication date: 25-Jul-2019
      • (2019)Model Change Detection With the MDL PrincipleIEEE Transactions on Information Theory10.1109/TIT.2018.285274764:9(6115-6126)Online publication date: 14-Nov-2019
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media