Abstract
In data clustering, density based algorithms are well known for the ability of detecting clusters of arbitrary shapes. DBSCAN is a widely used density based clustering approach, and the recently proposed density peak algorithm has shown significant potential in experiments. However, the DBSCAN algorithm may misclassify border data points of small density as noises and does not work well with large density variance across clusters, and the density peak algorithm has a large dependence on the detected cluster centers. To circumvent these problems, we make a study of these two algorithms and find that they have some complementary properties. We then propose to combine these two algorithms to overcome their problems. Specifically, we use the DP algorithm to detect cluster centers and then determine the parameters for DBSCAN adaptively. After DBSCAN clustering, we further use the DP algorithm to include border data points of small density into clusters. By combining the complementary properties of these two algorithms, we manage to relieve the problems of DBSCAN and avoid the drawbacks of the density peak algorithm in the meanwhile. Our algorithm is tested with synthetic and real datasets, and is demonstrated to perform better than DBSCAN and density peak algorithms, as well as some other clustering algorithms.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Achtert, E., Bohm, C., Kroger, P.: Deli-clu: boosting robustness, completeness, usability, and efficiency of hierarchical clustering by a closest pair ranking. In: International Conference on Knowledge Discovery and Data Mining, pp. 119–128 (2006)
Ankerst, M., Breunig, M.M., Kriegel, H.P., Sander, J.: Optics: Ordering points to identify the clustering structure. In: ACM SIGMOD International Conference on Management of Data, pp. 49–60 (1999). https://doi.org/10.1145/304182.304187
Bai, L., Cheng, X., Liang, J., Shen, H., Guo, Y.: Fast density clustering strategies based on the k-means algorithm. Pattern Recogn. 71, 375–386 (2017). https://doi.org/10.1016/j.patcog.2017.06.023
Bansal, N., Blum, A., Chawla, S.: Correlation clustering. Mach. Learn. 56(1–3), 89–113 (2004). https://doi.org/10.1023/B:MACH.0000033116.57574.95
Brendan, J.F., Delbert, D.: Clustering by passing messages between data points. Science 315, 972–976 (2007). https://doi.org/10.1126/science.1136800
Chang, H., Yeung, D.Y.: Robust path-based spectral clustering. Pattern Recogn. 41(1), 191–203 (2008). https://doi.org/10.1016/j.patcog.2007.04.010
Chen, Y., Tang, S., Bouguil, N., Wang, C., Du, J., Li, H.: A fast clustering algorithm based on pruning unnecessary distance computations in dbscan for high-dimensional data. Pattern Recogn. 83, 375–387 (2018). https://doi.org/10.1016/j.patcog.2018.05.030
Cheng, Y.: Mean shift, mode seeking, and clustering. IEEE Trans. Pattern Anal. Mach. Intell. 17(8), 790–799 (1995). https://doi.org/10.1109/34.400568
Comaniciu, D., Peter, M.: Mean shift: a robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Mach. Intell. 24(5), 603–619 (2002). https://doi.org/10.1109/34.1000236
Daszykowski, M., Walczak, B., Massart, D.L.: Looking for natural patterns in data: Part 1. density-based approach. Chemometr. Intell. Lab. Syst. 56(2), 83–92 (2001). https://doi.org/10.1016/s0169-7439(01)00111-3
Dong, S., Liu, J., Liu, Y., Zeng, L., Xu, C., Zhou, T.: Clustering based on grid and local density with priority-based expansion for multi-density data. Inf. Sci. 468, 103–116 (2018). https://doi.org/10.1016/j.ins.2018.08.018
Ester, M., Kriegel, H.P., Sander, J., Xu, X.W.: A density-based algorithm for discovering clusters in large spatial databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 226–231 (1996)
Ferone, A., Maratea, A.: Integrating rough set principles in the graded possibilistic clustering. Inf. Sci. 477, 148–160 (2019). https://doi.org/10.1016/j.ins.2018.10.038
Fu, L., Medico, E.: Flame, a novel fuzzy clustering method for the analysis of dna microarray data. BMC Bioinform. 8(1), 1–17 (2007). https://doi.org/10.1186/1471-2105-8-3
Gao, H., Nie, F., Li, X., Huang, H.: Multi-view subspace clustering. In: IEEE International Conference on Computer Vision, pp. 4238–4246 (2015). https://doi.org/10.1109/ICCV.2015.482
Gionis, A., Mannila, H., Tsaparas, P.: Clustering aggregation. ACM Trans. Knowl. Discov. Data 1(1), 1–30 (2007). https://doi.org/10.1145/1217299.1217303
Hinnerberg, A., Keim, D.: An efficient approach to clustering large multimedia databases with noise. In: International Conference on Knowledge Discovery and Data Mining, pp. 58–65 (1998)
Hou, J., Gao, H., Li, X.: DSets-DBSCAN: a parameter-free clustering algorithm. IEEE Trans. Image Process. 25(7), 3182–3193 (2016). https://doi.org/10.1109/TIP.2016.2559803
Hou, J., Gao, H., Li, X.: Feature combination via clustering. IEEE Trans. Neural Networks Learn. Syst. 29(4), 896–907 (2018). https://doi.org/10.1109/TNNLS.2016.2645883
Hou, J., Liu, W.: Clustering based on dominant set and cluster expansion. In: Pacific-Asia Conference on Knowledge Discovery and Data Mining, pp. 76–87 (2017)
Hou, J., Liu, W.: Parameter independent clustering based on dominant sets and cluster merging. Inf. Sci. 405, 1–17 (2017). https://doi.org/10.1016/j.ins.2017.04.006
Hou, J., Liu, W.: A parameter independent clustering framework. IEEE Trans. Industr. Inf. 13(4), 1825–1832 (2017). https://doi.org/10.1109/TII.2017.2656909
Jain, A.K., Law, M.H.C.: Data clustering: a user’s dilemma. In: International Conference on Pattern Recognition and Machine Intelligence, pp. 1–10 (2005)
Kumar, K.M., Reddy, A.R.M.: A fast dbscan clustering algorithm by accelerating neighbor searching using groups method. Pattern Recogn. 58, 39–48 (2016). https://doi.org/10.1016/j.patcog.2016.03.008
Li, C., You, C., Vidal, R.: Structured sparse subspace clustering: a joint affinity learning and subspace clustering framework. IEEE Trans. Image Process. 26(6), 2988–3001 (2017). https://doi.org/10.1109/TIP.2017.2691557
Li, J., Wang, C., Li, P., Lai, J.: Discriminative metric learning for multi-view graph partitioning. Pattern Recogn. 75, 199–213 (2018). https://doi.org/10.1016/j.patcog.2017.06.012
Li, Q., Liu, W., Li, L.: Affinity learning via a diffusion process for subspace clustering. Pattern Recogn. 84, 39–50 (2018). https://doi.org/10.1016/j.patcog.2018.07.002
Liu, R., Wang, H., Yu, X.: Shared-nearest-neighbor-based clustering by fast search and find of density peaks. Inf. Sci. 450, 200–226 (2018). https://doi.org/10.1016/j.ins.2018.03.031
von Luxburg, U.: A tutorial on spectral clustering. Stat. Comput. 17(4), 395–416 (2007). https://doi.org/10.1007/s11222-007-9033-z
Mequanint, E.Z., Pelillo, M.: Interactive image segmentation using constrained dominant sets. In: European Conference on Computer Vision, pp. 278–294 (2016)
Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: Advances in Neural Information Processing Systems, pp. 849–856 (2002)
Pavan, M., Pelillo, M.: Dominant sets and pairwise clustering. IEEE Trans. Pattern Anal. Mach. Intell. 29(1), 167–172 (2007). https://doi.org/10.1109/TPAMI.2007.250608
Qiu, T., Li, C., Li, Y.: D-NND: a hierarchical density clustering method via nearest neighbor descent. In: International Conference on Pattern Recognition, pp. 1414–1419 (2018). https://doi.org/10.1109/ICPR.2018.8545142
Rodriguez, A., Laio, A.: Clustering by fast search and find of density peaks. Science 344, 1492–1496 (2014). https://doi.org/10.1126/science.1242072
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 167–172 (2000). https://doi.org/10.1109/34.868688
Tripodi, R., Pelillo, M.: A game-theoretic approach to word sense disambiguation. Comput. Linguist. 43(1), 31–70 (2017)
Vascon, S., Mequanint, E.Z., Cristani, M., Hung, H., Pelillo, M., Murino, V.: Detecting conversational groups in images and sequences: a robust game-theoretic approach. Comput. Vis. Image Underst. 143, 11–24 (2016). https://doi.org/10.1016/j.cviu.2015.09.012
Veenman, C.J., Reinders, M., Backer, E.: A maximum variance cluster algorithm. IEEE Trans. Pattern Anal. Mach. Intell. 24(9), 1273–1280 (2002). https://doi.org/A maximum variance cluster algorithm
Yu, J., Chaomurilige, C., Yang, M.S.: On convergence and parameter selection of the EM and DA-EM algorithms for gaussian mixtures. Pattern Recogn. 77, 188–203 (2018). https://doi.org/10.1016/j.patcog.2017.12.014
Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 20(1), 68–86 (1971). https://doi.org/10.1109/t-c.1971.223083
Zhang, H., Ren, P.: Game theoretic hypergraph matching for multi-source image correspondences. Pattern Recogn. Lett. (2016). https://doi.org/10.1016/j.patrec.2016.07.011
Zhong, C., Miao, D., Fránti, P.: Minimum spanning tree based split-and-merge: a hierarchical clustering method. Inf. Sci. 181(16), 3397–3410 (2011). https://doi.org/10.1016/j.ins.2011.04.013
Zhu, X., Loy, C.C., Gong, S.: Constructing robust affinity graphs for spectral clustering. In: IEEE International Conference on Computer Vision and Pattern Recognition, pp. 1450–1457 (2014). https://doi.org/10.1109/cvpr.2014.188
Acknowledgement
This work is supported in part by the National Natural Science Foundation of China under Grant No. 61473045, and by the Natural Science Foundation of Liaoning Province under Grant No. 20170540013.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Switzerland AG
About this paper
Cite this paper
Hou, J., Lv, C., Zhang, A., E, X. (2019). Merging DBSCAN and Density Peak for Robust Clustering. In: Tetko, I., Kůrková, V., Karpov, P., Theis, F. (eds) Artificial Neural Networks and Machine Learning – ICANN 2019: Text and Time Series. ICANN 2019. Lecture Notes in Computer Science(), vol 11730. Springer, Cham. https://doi.org/10.1007/978-3-030-30490-4_48
Download citation
DOI: https://doi.org/10.1007/978-3-030-30490-4_48
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-30489-8
Online ISBN: 978-3-030-30490-4
eBook Packages: Computer ScienceComputer Science (R0)