Abstract
Density-based clustering is a sort of clustering analysis methods, which can discover clusters with arbitrary shape and is insensitive to noise data. The efficiency of data mining algorithms is strongly needed with data becoming larger and larger. In this paper, we present a new fast clustering algorithm called CURD, which means Clustering Using References and Density. Its creativity is capturing the shape and extent of a cluster with references, and then it analyzes the data based on the references. CURD preserves the ability of density based clustering method’s good advantages, and it is much efficient because of its nearly linear time complexity, so it can be used in mining very large databases. Both our theoretic analysis and experimental results confirm that CURD can discover clusters with arbitrary shape and is insensitive to noise data. In the meanwhile, its executing efficiency is much higher than R{star}}-tree based DBSCAN algorithm.
Supported by the National High Technology Development 863 Program of China under Grant No. 2002AA4Z3440; the Foundation of the innovation research institute of PKU-IBM; the National Grand Fundamental Research 973 Program of China under Grant No. G1999032705.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Han, J., Kambr, M.: Data mining concepts and techniques, pp. 145–176. Morgan Kaufmann Publisher, San Francisco (2000)
Anderberg, M.R.: Cluster analysis for applications. Academic Press, London (1973)
Ester, M., Kriegel, H.P., Sander, J., Xu, X.: A density based algorithm for discovering clusters in large spatial databases with noise. In: Proceedings of International Conference on Knowledge Discovery and Data Mining (1996)
Zhang, T., Ramakrishnan, R., Livny, M.: BIRCH: An efficient data clustering method for very large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Montreal, Canada, pp. 103–114 (1996)
Guha, S., Rastogi, R., Shim, K.: CURE: An efficient clustering algorithm for large databases. In: Proceedings of ACM SIGMOD International Conference on Management of Data, New York, pp. 73–84 (1998)
Aggrawal, R., Gehrke, J., Gunopulos, D., Raghavan, P.: Automatic Subspace Clustering of High Dimensional Data for Data Mining Applications. In: Proceedings of ACM SIGMOD International Conference on Management of Data, Seattle, Washington, pp. 94–105 (1998)
Goil, S., Nagesh, H., Choundhary, A.: MAFIA: Efficient and scalable Subspace Clustering for Very Large Data Sets. Technical Report Number CPDC-TR-9906- 019, Center for Parallel and Distributed Computing, Northwestern University (1999)
Hinneburg, A., Keim, D.A.: Optimal Grid-Clustering: Towards Breaking the Curse of Dimensionality in High-Dimensional Clustering. In: Proceedings of the 25th VLDB Conference, Edinburgh, Scotland (1999)
Guha, S., Rastogi, R., Rock, S.K.: A Robust Clustering Algorithm for Categorical Attributes. In: Proceedings of the International Conference on Data Engineering, Sydney, Australia, pp. 512–521 (1999)
George, K., Han, E.-H., Kumar, V.: CHAMELEON: A Hierarchical Clustering Algorithm Using Dynamic Modeling. IEEE Computer, 68–75 (1999)
Estivill-Castro, V., Lee, I.: AMOEBA: Hierarchical Clustering Based on Spatial Proximity Using Delaunay Diagram. In: Proceedings of the 9th International Symposium on Spatial Data Handling, Beijing, China (2000)
Nanopoulos, A., Theodoridis, Y., Manolopoulos, Y.: C2P: Clustering based on Closest Pairs. In: Proceedings of the 27th VLDB Conference, Roma, Italy (2001)
Berchtold, S., Bohm, C., Kriegel, H.-P.: The pyramid-technique: Towards breaking the curse of dimensionality. In: Proceedings of ACM SIGMOD International Conference on Management of Data, pp. 142–153 (1998)
Yu, C., Ooi, B.C., Tan, K.-L., Jagadish, H.V.: Indexing the Distance: An Efficient Method to KNN Processing. In: Proceedings of 27th VLDB Conference, Roma, Italy (2001)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Ma, S., Wang, T., Tang, S., Yang, D., Gao, J. (2003). A New Fast Clustering Algorithm Based on Reference and Density. In: Dong, G., Tang, C., Wang, W. (eds) Advances in Web-Age Information Management. WAIM 2003. Lecture Notes in Computer Science, vol 2762. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-45160-0_21
Download citation
DOI: https://doi.org/10.1007/978-3-540-45160-0_21
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-40715-7
Online ISBN: 978-3-540-45160-0
eBook Packages: Springer Book Archive