Abstract
Cluster analysis has long played an important role in a wide variety of data applications. When the clusters are irregular or intertwined, density-based clustering is proved to be much more efficient. The quality of clustering result depends on an adequate choice of the parameters. However, without enough domain knowledge the parameter setting is somewhat limited in its operability. In this paper, a new method is proposed to automatically find out the optimal parameter value of the bandwidth. It is to infer the most suitable parameter value by the constructed model on parameter estimation. Based on the Bayesian Theorem, from which the most probability value for the bandwidth can be acquired in accordance with the inherent distribution characteristics of the original data set. Clusters can then be identified by the determined parameter values. The results of the experiment show that the proposed method has complementary advantages in the density-based clustering algorithm.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
Ankerst, M., Breuing, M.M., Kriegel, H.P.: OPTICS: ordering points to identify the clustering structure. In: Proc. of the 1999 ACM SIGMOD International Conference on Management of Data, pp. 49–60. ACM Press, New York (1999)
Hinneburg, A., Keim, D.A.: An efficient approach to clustering in large multimedia databases with noise. In: Proc of the 4th International Conference on Knowledge Discovery and Data mining, pp. 58–65. AAAI Press, Menlo Park (1998)
George, K., Han, E.H., Kumar, V.: CHAMELEON: a hierarchical clustering algorithm using dynamic modeling. IEEE Computer 27(3), 329–341 (1999)
Ester, M., Kriegel, H.P., Sander, J.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: Proc.of the 2nd International Conference on Knowledge Discovery and Data Mining, pp. 226–231. AAAI Press, Menlo Park (1996)
Tan, P.N., Steinbach, M., Kumar, V.: Introduction to Data Mining. Pearson Education, London (2006)
Gentle, J.E.: Computational Statistics. Springer, New York (2001)
Han, J., Kamber, M.: Data Mining: Concepts and Techniques. Morgan Kaufmann, San Francisco (2000)
Gan, W.Y., Li, D.Y.: Hierarchical Clustering based on Kernel Density Estimation. Journal of System Simulation 16(2), 302–309 (2004)
Dellaportas, P., Forster, J.J., Ntzourfras, I.: On Bayesian model and variable selection using MCMC. Statistic and Computing 12(2), 27–36 (2002)
Gelman, A., Carlin, J.B., Stern, H.S., Rubin, D.B.: Bayesian Data Analysis, 2nd edn. Chapman&Hall, London (2004)
Chen, M.H., Shao, Q.M., Ibrahim, J.G.: Monte Carlo Methods in Bayesian Computation. Springer, New York (2000)
Gilks, W.R., Richardson, S., Spiegelhalter, D.J.: Introducing Markov chain Monte Carlo. In: Gilks, W.R., Richardson, S., Spiegelhalter, D.T. (eds.) Markov Chain Monte Carlo in Practice, pp. 1–19. Chapman and Hall, London (1996a)
Terrell, G.R., Scott, D.W.: Variable kernel density estimation. Annals of Statistics (20), 1236–1265 (1992)
Duong, T., Hazelton, M.L.: Plug-in Bandwidth Selectors for Bivariate Kernel Density Estimation. Journal of Nonparametric Statistics (15), 17–30 (2003)
Scott, D.W.: Multivariate Density Estimation: Theory, Practice, Visualization. Wiley, New York (1992)
Fang, M., Wang, S.L., Jin, H.: Spatial Neighborhood Clustering Based on Data Field. In: Cao, L., Feng, Y., Zhong, J. (eds.) ADMA 2010, Part I. LNCS, vol. 6440, pp. 262–269. Springer, Heidelberg (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Jin, H., Wang, S., Zhou, Q., Li, Y. (2011). Optimal Bandwidth Selection for Density-Based Clustering. In: Xu, J., Yu, G., Zhou, S., Unland, R. (eds) Database Systems for Adanced Applications. DASFAA 2011. Lecture Notes in Computer Science, vol 6637. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-20244-5_15
Download citation
DOI: https://doi.org/10.1007/978-3-642-20244-5_15
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-20243-8
Online ISBN: 978-3-642-20244-5
eBook Packages: Computer ScienceComputer Science (R0)