Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Density-based hierarchical clustering for streaming data

Published: 01 April 2012 Publication History

Abstract

For streaming data that arrive continuously such as multimedia data and financial transactions, clustering algorithms are typically allowed to scan the data set only once. Existing research in this domain mainly focuses on improving the accuracy of clustering. In this paper, a novel density-based hierarchical clustering scheme for streaming data is proposed in order to improve both accuracy and effectiveness; it is based on the agglomerative clustering framework. Traditionally, clustering algorithms for streaming data often use the cluster center to represent the whole cluster when conducting cluster merging, which may lead to unsatisfactory results. We argue that even if the data set is accessed only once, some parameters, such as the variance within cluster, the intra-cluster density and the inter-cluster distance, can be calculated accurately. This may bring measurable benefits to the process of cluster merging. Furthermore, we employ a general framework that can incorporate different criteria and, given the same criteria, will produce similar clustering results for both streaming and non-streaming data. In experimental studies, the proposed method demonstrates promising results with reduced time and space complexity.

References

[1]
Aggarwal, C., Han, J.W., Wang, J., 2003. A framework for clustering evolving data streams. In: Proc. 29th Internat. Conf. on Very large databases, pp. 81-92.
[2]
Ankerst, M., Breunig, M.M., Kriegel, H.P., 1999. OPTICS: Ordering points to identify the clustering structure. In: ACM SIGMOD'99 Internat. Conf. on Management of Data, Philadelphia, pp. 40-60.
[3]
Babcock, B., Babu, S., Datar, M., 2002. Models and issues in data streams. In: Proc. ACM SIGACT-SIGMOD Symp. on Principles of Database Systems, pp. 1-16.
[4]
Cao, F., Ester, M., Qian, W., 2006. Density-based clustering over an evolving data stream with noise. In: Proc. SIAM Conf. on Data Ming, pp. 328-339.
[5]
Chen, Y.X., Tu, L., 2007. Density-based clustering for real-time stream data. IN: Internat. Conf. on Knowledge Discovery and Data Mining, pp. 133-142.
[6]
Dong, G.Z., Han J.W., Laks, V.S., 2003. Online mining of changes from data streams: Research problems and preliminary results. In: Proc. 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams, pp. 225-236.
[7]
A new vector quantization clustering algorithm. IEEE Trans. Acoustics, Speech, Signal Process. v37 i10. 1568-1575.
[8]
Iterative shrinking method for clustering problems. Pattern Recognit. v39 i5. 761-765.
[9]
Fast and memory efficient implementation of the exact PNN. IEEE Trans. Image Process. v9 i5. 773-777.
[10]
. Data mining: Concepts and techniques. 267-273.
[11]
Henzinger, M.R., Raghavan, P., Rajagopalan, S., 1998. Computing on data streams, Compaq Systems Research Center, Technical Report TR 1998-011.
[12]
Karypis, G., Han, E.H., Kumar, V., 1999. CHAMELEON: A hierarchical clustering algorithm using dynamic modeling. IEEE Computer: Special Issue on Data Analysis and Mining.
[13]
Finding Groups in Data: An Introduction to Cluster Analysis. John Wiley& Sons, New York.
[14]
Hierarchical initialization approach for K-Means clustering. Pattern Recognition Lett. v29 i6. 787-795.
[15]
O'Callaghan, L., Mishra, N., Meyerson, A., 2002. Streaming-data algorithms for high-quality clustering. In: 18th Internat. Conf. on Data Engineering (ICDE'02), pp. 685-694.
[16]
Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. v58 i301. 236-244.
[17]
Clustering of the self-organizing map using a clustering validity index based on inter-cluster and intra-cluster density. Pattern Recognit. v37 i2. 175-188.
[18]
Minimum spanning tree based split-and-merge: A hierarchical clustering method. Informat. Sci. v181 i16. 3397-3410.

Cited By

View all
  • (2021)IM-c-means: a new clustering algorithm for clusters with skewed distributionsPattern Analysis & Applications10.1007/s10044-020-00932-224:2(611-623)Online publication date: 1-May-2021
  • (2019)Hyper-cylindrical micro-clustering for streaming data with unscheduled data removalsKnowledge-Based Systems10.1016/j.knosys.2016.02.00499:C(183-200)Online publication date: 1-Jan-2019
  • (2018)Variance and density-based anomaly identification and ranking for evolving data streamsInternational Journal of Computational Intelligence Studies10.1504/IJCISTUDIES.2014.0627343:2/3(251-274)Online publication date: 13-Dec-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 April 2012

Author Tags

  1. Density-based clustering
  2. Hierarchical method
  3. Streaming data

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2021)IM-c-means: a new clustering algorithm for clusters with skewed distributionsPattern Analysis & Applications10.1007/s10044-020-00932-224:2(611-623)Online publication date: 1-May-2021
  • (2019)Hyper-cylindrical micro-clustering for streaming data with unscheduled data removalsKnowledge-Based Systems10.1016/j.knosys.2016.02.00499:C(183-200)Online publication date: 1-Jan-2019
  • (2018)Variance and density-based anomaly identification and ranking for evolving data streamsInternational Journal of Computational Intelligence Studies10.1504/IJCISTUDIES.2014.0627343:2/3(251-274)Online publication date: 13-Dec-2018
  • (2018)A potential-based clustering method with hierarchical optimizationWorld Wide Web10.1007/s11280-017-0509-221:6(1617-1635)Online publication date: 1-Nov-2018
  • (2017)Fat node leading tree for data stream clustering with density peaksKnowledge-Based Systems10.1016/j.knosys.2016.12.025120:C(99-117)Online publication date: 15-Mar-2017
  • (2017)A hybrid bio-inspired algorithm and its applicationApplied Intelligence10.1007/s10489-017-0951-y47:4(1059-1067)Online publication date: 1-Dec-2017
  • (2015)A modified fuzzy min-max neural network for data clustering and its application to power quality monitoringApplied Soft Computing10.1016/j.asoc.2014.09.05028:C(19-29)Online publication date: 1-Mar-2015

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media