Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1321440.1321551acmconferencesArticle/Chapter ViewAbstractPublication PagescikmConference Proceedingsconference-collections
research-article

Grid-based subspace clustering over data streams

Published: 06 November 2007 Publication History

Abstract

A real-life data stream usually contains many dimensions and some dimensional values of its data elements may be missing. In order to effectively extract the on-going change of a data stream with respect to all the subsets of the dimensions of the data stream, a grid-based subspace clustering algorithm is proposed in this paper. Given an n-dimensional data stream, the on-going distribution statistics of data elements in each one-dimension data space is firstly monitored by a list of grid-cells called a sibling list. Once a dense grid-cell of a first-level sibling list becomes a dense unit grid-cell, new second-level sibling lists are created as its child nodes in order to trace any cluster in all possible two-dimensional rectangular subspaces. In such a way, a sibling tree grows up to the nth level at most and a k-dimensional subcluster can be found in the kth level of the sibling tree. The proposed method is comparatively analyzed by a series of experiments to identify its various characteristics.

References

[1]
M. Garofalakis, J. Gehrke and R. Rastogi. Querying and mining data streams: you only get one look. In the tutorial notes of the 28th Int'l Conference on Very Large Databases, Hong Kong, China, Aug. 2002.
[2]
Joong Hyuk Chang, Won Suk Lee. Finding frequent itemsets over online data streams. Information & Software Technology 48(7), page 606--618, 2006.
[3]
M. Datar, A. Gionis, P. Indyk and R. Motawi. Maintaining stream statistics over sliding window. In Proc. of the 13th Annual ACM-SIAM Symp. on Discrete Algorithms, Jan. 2002.
[4]
Mohamed Medhat Gaber, Arkady B. Zaslavsky, Shonali Krishnaswamy: Mining data streams: a review. SIGMOD Record 34(2), page 18--26, 2005.
[5]
W. Wang, J. Yang, and R. Muntz. Sting: A statistical information grid approach to spatial data mining, 1997.
[6]
Sudipto Guha, Rajeev Rastogi and Kyuseok Shim. CURE: an efficient clustering algorithm for large databases. In Proc. SIGMOD, pages 73--84, 1998.
[7]
Rakesh Agrawal, Johannes Gehrke, Dimitrios Gunopulos, and Prabhakar Raghavan. Automatic subspace clustering of high dimensional data for data mining applications. In Proceedings of the 1998 ACM SIGMOD International Conference on Management of Data, pages 94--105. ACM Press, 1998.
[8]
Chun-Hung Cheng, Ada Waichee Fu, and Yi Zhang. Entropy-based subspace clustering for mining numerical data. In Proceedings of the fifth ACM SIGKDD International Conference on Knowledge discovery and data mining, pages 84--93. ACM Press, 1999.
[9]
Hans-Peter Kriegel, Peer Kroger, Matthias Renz and Sebastian Wurst. A Generic Framework for Efficient Subspace Clustering of High-Dimensional Data. In Proceedings of the Fifth IEEE International Conference on Data Mining, pages 250--257, 2005.
[10]
Liadan O'Callaghan, Nina Mishra, Adam Meyerson, Sudipto Guha, and Rajeev Motwani. STREAM-data algorithms for high-quality clustering. In Proc. of IEEE International Conference on Data Engineering, March 2002.
[11]
R. O. Duda and P. E. Hart. Pattern Classification and Scene Analysis. Wiley, 1972.
[12]
Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu. A Framework for Clustering Evolving Data Streams. In Proc. VLDB 29th, Berlin, 2003.
[13]
T. Zhang, R. Ramakrishnan, and M. Livny. BIRCH: an efficient data clustering method for very large databases. In Proc. SIGMOD, pages 103--114, 1996.
[14]
Joong Hyuk Chang, Won Suk Lee. A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams. J. Inf. Sci. Eng. 20(4), page 753--762, 2004.
[15]
Nam Hun Park and Won Suk Lee. Cell trees: An Adaptive Synopsis structure for clustering multi-dimensional on-line data streams. J. Data & Knowledge Engineering, Volume 63, Issue 2, Pages 528--549, November 2007.
[16]
Nam Hun Park and Won Suk Lee. A statistical Grid-based Clustering over data streams. ACM SIGMOD Record, Volume 33, Issue 1, Page 32--37, 2004.
[17]
M. Ester, H. Kriegel, J. Sander, and X. Xu. A density-based algorithm for discovering clusters in large spatial databases, 1996.

Cited By

View all
  • (2022)Efficient Gaussian Kernel Microcluster Real-Time Clustering Method for Industrial Internet of Things (IIoT) StreamsIEEE Internet of Things Journal10.1109/JIOT.2022.31793209:21(21323-21337)Online publication date: 1-Nov-2022
  • (2020)A Malware Detection Method for Health Sensor Data Based on Machine Learning2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT)10.1109/ICIoT48696.2020.9089478(277-282)Online publication date: Feb-2020
  • (2020)Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough TransformAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_28(356-368)Online publication date: 6-May-2020
  • Show More Cited By

Index Terms

  1. Grid-based subspace clustering over data streams

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    CIKM '07: Proceedings of the sixteenth ACM conference on Conference on information and knowledge management
    November 2007
    1048 pages
    ISBN:9781595938039
    DOI:10.1145/1321440
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 06 November 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. adaptive memory utilization
    2. clustering
    3. data mining
    4. data streams
    5. grid-based clustering

    Qualifiers

    • Research-article

    Conference

    CIKM07

    Acceptance Rates

    Overall Acceptance Rate 1,861 of 8,427 submissions, 22%

    Upcoming Conference

    CIKM '25

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 20 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2022)Efficient Gaussian Kernel Microcluster Real-Time Clustering Method for Industrial Internet of Things (IIoT) StreamsIEEE Internet of Things Journal10.1109/JIOT.2022.31793209:21(21323-21337)Online publication date: 1-Nov-2022
    • (2020)A Malware Detection Method for Health Sensor Data Based on Machine Learning2020 IEEE International Conference on Informatics, IoT, and Enabling Technologies (ICIoT)10.1109/ICIoT48696.2020.9089478(277-282)Online publication date: Feb-2020
    • (2020)Detecting Arbitrarily Oriented Subspace Clusters in Data Streams Using Hough TransformAdvances in Knowledge Discovery and Data Mining10.1007/978-3-030-47426-3_28(356-368)Online publication date: 6-May-2020
    • (2019)Optimizing Data Stream Representation: An Extensive Survey on Stream Clustering AlgorithmsBusiness & Information Systems Engineering10.1007/s12599-019-00576-561:3(277-297)Online publication date: 21-Jan-2019
    • (2018)PCA-based hierarchical clustering of AIS trajectories with automatic extraction of clusters2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA)10.1109/ICBDA.2018.8367725(448-452)Online publication date: Mar-2018
    • (2017)Versatile Hyper-Elliptic Clustering Approach for Streaming Data Based on One-Pass-Thrown-Away LearningJournal of Classification10.1007/s00357-017-9222-134:1(108-147)Online publication date: 1-Apr-2017
    • (2015)A data stream outlier detection algorithm based on gridThe 27th Chinese Control and Decision Conference (2015 CCDC)10.1109/CCDC.2015.7162657(4136-4141)Online publication date: May-2015
    • (2015)Subspace clustering of data streamsJournal of Intelligent Information Systems10.1007/s10844-014-0319-245:3(319-335)Online publication date: 1-Dec-2015
    • (2013)Clustering Data Streams Using Mass EstimationProceedings of the 2013 15th International Symposium on Symbolic and Numeric Algorithms for Scientific Computing10.1109/SYNASC.2013.45(289-295)Online publication date: 23-Sep-2013
    • (2013)Effective Evaluation Measures for Subspace Clustering of Data StreamsRevised Selected Papers of PAKDD 2013 International Workshops on Trends and Applications in Knowledge Discovery and Data Mining - Volume 786710.1007/978-3-642-40319-4_30(342-353)Online publication date: 14-Apr-2013
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media