Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1281192.1281210acmconferencesArticle/Chapter ViewAbstractPublication PageskddConference Proceedingsconference-collections
Article

Density-based clustering for real-time stream data

Published: 12 August 2007 Publication History

Abstract

Existing data-stream clustering algorithms such as CluStream arebased on k-means. These clustering algorithms are incompetent tofind clusters of arbitrary shapes and cannot handle outliers. Further, they require the knowledge of k and user-specified time window. To address these issues, this paper proposes D-Stream, a framework for clustering stream data using adensity-based approach. The algorithm uses an online component which maps each input data record into a grid and an offline component which computes the grid density and clusters the grids based on the density. The algorithm adopts a density decaying technique to capture the dynamic changes of a data stream. Exploiting the intricate relationships between the decay factor, data density and cluster structure, our algorithm can efficiently and effectively generate and adjust the clusters in real time. Further, a theoretically sound technique is developed to detect and remove sporadic grids mapped to by outliers in order to dramatically improve the space and time efficiency of the system. The technique makes high-speed data stream clustering feasible without degrading the clustering quality. The experimental results show that our algorithm has superior quality and efficiency, can find clusters of arbitrary shapes, and can accurately recognize the evolving behaviors of real-time data streams.

References

[1]
C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu. A framework for clustering evolving data streams. In Proc. VLDB pages 81--92, 2003.
[2]
B. Babcock, M. Datar, R. Motwani, and L. O'Callaghan. Maintaining variance and k-medians over data stream windows. In Proceedings of the twenty-second ACM symposium on Principles of database systems pages 234--243, 2003.
[3]
S. Bandyopadhyay, C. Giannella, U. Maulik, H. Kargupta, K. Liu, and S. Datta. Clustering distributed data streams in peer-to-peer environments. Information Sciences 176(14):1952--1985, 2006.
[4]
D. Barbará. Requirements for clustering data streams. SIGKDD Explorations Newsletter 3(2):23--27, 2002.
[5]
J. Beringer and E. Hüllermeier. Online-clustering of parallel data streams. Data and Knowledge Engineering 58(2):180--204, 2006.
[6]
B. R. Dai, J. W. Huang, M. Y. Yeh, and M. S. Chen. Adapative clustering for multiple evolving streams. IEEE Transaction On Knowledge and data engineering 18(9), 2006.
[7]
M. Garofalakis, J. Gehrke, and R. Rastogi. Querying and mining data streams: you only get one look. In Proc. ACM SIGMOD pages 635--635, 2002.
[8]
L. Golab and M. T. Özsu. Issues in Data Stream Management. SIGMOD Record 32(2):5--14, 2003.
[9]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. Trans. Know. Eng. 15(3):515--528, 2003.
[10]
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Annual IEEE Symp. on Foundations of Comp. Sci. pages 359--366, 2000.
[11]
O. Nasraoui, C. Rojas, and C. Cardona. A framework for mining evolving trends in web data streams using dynamic learning and retrospective validation. Computer Networks 50(10):1488--1512, 2006.
[12]
L. O'Callaghan, N. Mishra, A. Meyerson, S. Guha, and R. Motwani. Streaming-data algorithms for high-quality clustering. In Proc. of 18th International Conference on Data Engineering pages 685--694, 2002.
[13]
S. Oh, J. Kang, Y. Byun, G. Park, and S. Byun. Intrusion detection based on on clustering a data stream. In Third ACIS International Conference on Software Engineering Research, Management and Applications pages 220--227, 2005.
[14]
J. Sander, M. Ester, H. Kriegel, and X. Xu. Density-based clustering in spatial databases: The algorithm gdbscan and its applications. Data Min. Knowl. Discov. 2(2):169--194, 1998.
[15]
S. Subramaniam, T. Palpanas, D. Papadopoulos, V. Kalogeraki, and D. Gunopulos. Online outlier detection in sensor data using non-parametric models. In Proc. VLDB pages 187--198, 2006.
[16]
H. Sun, G. Yu, Y. Bao, F. Zhao, and D. Wang. S-tree: an effective index for clustering arbitrary shapes in data streams. In Research Issues in Data Engineering: Stream Data Mining and Applications, 15th International Workshop on pages 81--88, 2005.
[17]
Z. Wang, B. Wang, C. Zhou, and X. Xu. Clustering Data streams on the Two-tier structure. Advanced Web Technologies and Applications pages 416--425, 2004.

Cited By

View all
  • (2024)Fractional Adaptive Resonance Theory (FRA-ART): An Extension for a Stream Clustering Method with Enhanced Data RepresentationMathematics10.3390/math1213204912:13(2049)Online publication date: 30-Jun-2024
  • (2024)Towards Metric DBSCAN: Exact, Approximate, and Streaming AlgorithmsProceedings of the ACM on Management of Data10.1145/36549812:3(1-25)Online publication date: 30-May-2024
  • (2024)TWStream: Three-Way Stream ClusteringIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.336971632:9(4927-4939)Online publication date: 1-Sep-2024
  • Show More Cited By

Index Terms

  1. Density-based clustering for real-time stream data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    KDD '07: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
    August 2007
    1080 pages
    ISBN:9781595936097
    DOI:10.1145/1281192
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 August 2007

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. d-stream
    2. density-based clustering
    3. sporadic grids
    4. stream data mining

    Qualifiers

    • Article

    Conference

    KDD07

    Acceptance Rates

    KDD '07 Paper Acceptance Rate 111 of 573 submissions, 19%;
    Overall Acceptance Rate 1,133 of 8,635 submissions, 13%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)141
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 21 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Fractional Adaptive Resonance Theory (FRA-ART): An Extension for a Stream Clustering Method with Enhanced Data RepresentationMathematics10.3390/math1213204912:13(2049)Online publication date: 30-Jun-2024
    • (2024)Towards Metric DBSCAN: Exact, Approximate, and Streaming AlgorithmsProceedings of the ACM on Management of Data10.1145/36549812:3(1-25)Online publication date: 30-May-2024
    • (2024)TWStream: Three-Way Stream ClusteringIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.336971632:9(4927-4939)Online publication date: 1-Sep-2024
    • (2024)End-to-End System Level Solution for DDoS Attack Detection and Prediction for 5G and Future Wireless Networks2024 International Conference on Smart Applications, Communications and Networking (SmartNets)10.1109/SmartNets61466.2024.10577691(1-7)Online publication date: 28-May-2024
    • (2024)A Real-Time Radar Signal Sorting Method Under Bayesian Framework With Dynamic Cluster MergingIEEE Sensors Journal10.1109/JSEN.2024.343102124:17(27859-27869)Online publication date: 1-Sep-2024
    • (2024)Egg Shape as a Generalized Radial Basis Shape for Discard-After-Learn Method in Streaming Data2024 21st International Joint Conference on Computer Science and Software Engineering (JCSSE)10.1109/JCSSE61278.2024.10613706(268-274)Online publication date: 19-Jun-2024
    • (2024)Ocean: Online Clustering and Evolution Analysis for Dynamic Streaming Data2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00343(4504-4517)Online publication date: 13-May-2024
    • (2024)An Efficient Fuzzy Stream Clustering Method Based on Granular-Ball Structure2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00074(901-913)Online publication date: 13-May-2024
    • (2024)Towards Analysing Climate Change Temperature Patterns through Stream Clustering Methods2024 IEEE International Conference on Evolving and Adaptive Intelligent Systems (EAIS)10.1109/EAIS58494.2024.10570034(1-5)Online publication date: 23-May-2024
    • (2024)An Empirical Approach for Clustering-Based Time Series Summarisation Assessment2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)10.1109/COMPSAC61105.2024.00046(279-284)Online publication date: 2-Jul-2024
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media