Abstract
Density peaks clustering (DPC) algorithm has been widely applied in many fields due to its innovation and efficiency. However, the original DPC algorithm and many of its variants choose Euclidean distance as local density and relative distance estimations, which affects the clustering performance on some specific shaped datasets, such as manifold datasets. To address the above-mentioned issue, we propose a density peak clustering algorithm with connected local density and punished relative distance (DPC–CLD–PRD). Specifically, the proposed approach computes the distance matrix between data pairs using the flexible connectivity distance metric. Then, it calculates the connected local density of each data point via combining the flexible connectivity distance measure and k-nearest neighbor method. Finally, the punished relative distance of each data point is obtained by introducing a connectivity estimation strategy into the distance optimization process. Experiments on synthetic, real-world, and image datasets have shown the effectiveness of the algorithm in this paper.
Similar content being viewed by others
Availability of data and materials
Data and materials will be made available on reasonable request.
References
Flores KG, Garza SE (2020) Density peaks clustering with gap-based automatic center detection. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2020.106350. (in English)
Pujari AK, Rajesh K, Reddy DS (2021) Clustering techniques in data mining—a survey. Iete J Res 47(1–2):19–28. https://doi.org/10.1080/03772063.2001.11416199. (in English)
Pastuchova E, Vaclavikova S (2013) Cluster analysis—data mining technique for discovering natural groupings in the data. J Electr Eng-Slovak 64(2):128–131. https://doi.org/10.2478/jee-2013-0019. (in English)
Gao K, Khan HA, Qu WW (2022) Clustering with missing features: a density-based approach. Symmetry-Basel. https://doi.org/10.3390/sym14010060. (in English)
Liu HF, Li J, Wu Y, Fu Y (2021) Clustering with outlier removal. IEEE Trans Knowl Data Eng 33(6):2369–2379. https://doi.org/10.1109/Tkde.2019.2954317. (in English)
Xu X, Ding SF, Wang YR, Wang LJ, Jia WK (2021) A fast density peaks clustering algorithm with sparse search. Inf Sci 554:61–83. https://doi.org/10.1016/j.ins.2020.11.050. (in English)
Cheng DD, Zhu QS, Huang JL, Wu QW, Yang LJ (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387. https://doi.org/10.1109/Tkde.2019.2930056. (in English)
Raissi M, Perdikaris P, Karniadakis GE (2017) Machine learning of linear differential equations using Gaussian processes. J Comput Phys 348:683–693. https://doi.org/10.1016/j.jcp.2017.07.050. (in English)
Tsai CF, Lin WC, Hu YH, Yao GT (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54. https://doi.org/10.1016/j.ins.2018.10.029. (in English)
Fahad A et al (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/Tetc.2014.2330519. (in English)
Zhao QH, Li XL, Li Y, Zhao XM (2017) A fuzzy clustering image segmentation algorithm based on Hidden Markov Random Field models and Voronoi Tessellation. Pattern Recogn Lett 85:49–55. https://doi.org/10.1016/j.patrec.2016.11.019. (in English)
Choy SK, Lam SY, Yu KW, Lee WY, Leung KT (2017) Fuzzy model-based clustering and its application in image segmentation. Pattern Recogn 68:141–157. https://doi.org/10.1016/j.patcog.2017.03.009. (in English)
Hou J, Liu WX, Xu E, Cui HX (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recogn 60:25–36. https://doi.org/10.1016/j.patcog.2016.04.015. (in English)
Wang H et al (2017) Pattern recognition and classification of two cancer cell lines by diffraction imaging at multiple pixel distances. Pattern Recogn 61:234–244. https://doi.org/10.1016/j.patcog.2016.07.035. (in English)
Nilashi M, Bagherifard K, Rahmani M, Rafe V (2017) A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques. Comput Ind Eng 109:357–368. https://doi.org/10.1016/j.cie.2017.05.016. (in English)
Guo GB, Zhang J, Yorke-Smith N (2015) Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowl-Based Syst 74:14–27. https://doi.org/10.1016/j.knosys.2014.10.016. (in English)
Zahra S, Ghazanfar MA, Khalid A, Azam MA, Naeem U, Prugel-Bennett A (2015) Novel centroid selection approaches for K means-clustering based recommender systems. Inf Sci 320:156–189. https://doi.org/10.1016/j.ins.2015.03.062. (in English)
Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI Press
Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072. (in English)
Macqueen J (1967) Some methods for classification and analysis of multivariate observations, In: Proc Symp Math Statist and Probability, 5th, vol. 1
Mazzeo GM, Masciari E, Zaniolo C (2017) A fast and accurate algorithm for unsupervised clustering around centroids. Inf Sci 400:63–90. https://doi.org/10.1016/j.ins.2017.03.002. (in English)
Lei T, Jia XH, Zhang YN, He LF, Meng HY, Nandi AK (2018) Significantly fast and robust fuzzy c-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Trans Fuzzy Syst 26(5):3027–3041. https://doi.org/10.1109/Tfuzz.2018.2796074. (in English)
Johnson S, Hierarchical clustering schemes, Psychometrika
Tian Z, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large. ACM SIGMOD Rec 25(2):103–114
Zhao J, Tang JJ, Fan TH, Li CM, Xu LZ (2020) Density peaks clustering based on circular partition and grid similarity. Concurr Comp-Pract E. https://doi.org/10.1002/cpe.5567. (in English)
Yue SH, Wang JS, Wu T, Wang HX (2010) A new separation measure for improving the effectiveness of validity indices. Inf Sci 180(5):748–764. https://doi.org/10.1016/j.ins.2009.11.005. (in English)
Chen T, Zhang NL, Liu TF, Poon KM, Wang Y (2012) Model-based multidimensional clustering of categorical data. Artif Intell 176(1):2246–2269. https://doi.org/10.1016/j.artint.2011.09.003. (in English)
Yang MS, Chang-Chien SJ, Nataliani Y (2019) Unsupervised fuzzy model-based Gaussian clustering. Inf Sci 481:1–23. https://doi.org/10.1016/j.ins.2018.12.059. (in English)
Selvi C, Sivasankar E (2019) A novel optimization algorithm for recommender system using modified fuzzy c-means clustering approach. Soft Comput 23(6):1901–1916. https://doi.org/10.1007/s00500-017-2899-6. (in English)
Peng Y, Zhu Q, Huang B (2011) Spectral clustering with density sensitive similarity function. Knowl-Based Syst 24(5):621–628
Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on multiple manifolds. IEEE Trans Neural Netw 22(7):1149–1161. https://doi.org/10.1109/Tnn.2011.2147798. (in English)
Tao XM et al (2021) Density peak clustering using global and local consistency adjustable manifold distance. Inf Sci 577:769–804. https://doi.org/10.1016/j.ins.2021.08.036. (in English)
Ikotun AM, Ezugwu AE, Abualigah L, Abuhaija B, Heming J (2023) K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf Sci 622:178–210. https://doi.org/10.1016/j.ins.2022.11.139. (in English)
Nie FP, Li ZH, Wang R, Li XL (2023) An effective and efficient algorithm for k-means clustering with new formulation. IEEE Trans Knowl Data Eng 35(4):3433–3443. https://doi.org/10.1109/Tkde.2022.3155450. (in English)
Cheng DD, Huang JL, Zhang SL, Xia SY, Wang GY, Xie J (2023) K-means clustering with natural density peaks for discovering arbitrary-shaped clusters. IEEE Trans Neural Netw Learn. https://doi.org/10.1109/Tnnls.2023.3248064. (in English)
Murtagh F, Legendre P (2014) Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion?, Springer US, no. 3
Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wires Data Min Knowl 2(1):86–97. https://doi.org/10.1002/widm.53
Kimes PK, Liu YF, Hayes DN, Marron JS (2017) Statistical significance for hierarchical clustering. Biometrics 73(3):811–821. https://doi.org/10.1111/biom.12647. (in English)
Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68. https://doi.org/10.1109/2.781637. (in English)
Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering algorithm for large databases. Inform Syst 26(1):35–58. https://doi.org/10.1016/S0306-4379(01)00008-4. (in English)
Du MJ, Wu FY (2022) Grid-based clustering using boundary detection. Entropy-Switz. https://doi.org/10.3390/e24111606. (in English)
Starczewski A, Scherer MM, Ksiazek W, Debski M, Wang LP (2021) A novel grid-based clustering algorithm. J Artif Intell Soft 11(4):319–330. https://doi.org/10.2478/jaiscr-2021-0019. (in English)
Wang W, Yang J, Muntz R (1997) STING: A statistical information grid approach to spatial data mining. In VLDB'97, Proceedings of 23rd international conference on very large data bases, 25–29, Athens, Greece
Tareq M, Sundararajan EA, Harwood A, Abu Bakar A (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10(579–596):2022. https://doi.org/10.1109/Access.2021.3134704
Du MJ, Zhao JQ, Sun JR, Dong YQ (2022) M3W: multistep three-way clustering. IEEE Trans Neural Netw Learn. https://doi.org/10.1109/Tnnls.2022.3208418
Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data An 71:52–78. https://doi.org/10.1016/j.csda.2012.12.008. (in English)
Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers
Asheri H, Hosseini R, Araabi BN (2021) A new EM algorithm for flexibly tied GMMs with large number of components. Pattern Recogn. https://doi.org/10.1016/j.patcog.2021.107836. (in English)
Zhao J, Wang G, Pan JS, Fan TH, Lee IV (2023) Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets. Pattern Recogn. https://doi.org/10.1016/j.patcog.2023.109406. (in English)
Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm, Proc Nips
Zhang P et al (2022) Consensus one-step multi-view subspace clustering. IEEE Trans Knowl Data Eng 34(10):4676–4689. https://doi.org/10.1109/Tkde.2020.3045770. (in English)
Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2019) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng 31(10):2022–2034
Yin H, Hu W, Li F, Lou J (2021) One-step multi-view spectral clustering by learning common and specific nonnegative embeddings. Int J Mach Learn Cyb 12:2121–2134
Zheng QH, Zhu JH, Li ZY, Pang SM, Wang J, Li YC (2020) Feature concatenation multi-view subspace clustering. Neurocomputing 379:89–102. https://doi.org/10.1016/j.neucom.2019.10.074. (in English)
Schneider J, Vlachos M (2017) "Scalable density-based clustering with quality guarantees using random projections. Data Min Knowl Disc 31(4):972–1005. https://doi.org/10.1007/s10618-017-0498-x. (in English)
Zhang RL, Miao ZG, Tian Y, Wang HP (2022) A novel density peaks clustering algorithm based on Hopkins statistic. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2022.116892. (in English)
Ester M (2009) Density-based clustering. Springer, US
Zhang QH, Dai YY, Wang GY (2023) Density peaks clustering based on balance density and connectivity. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109052. (in English)
Ding SF et al (2023) A sampling-based density peaks clustering algorithm for large-scale data. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109238. (in English)
Rasool Z, Aryal S, Bouadjenek MR, Dazeley R (2023) Overcoming weaknesses of density peak clustering using a data-dependent similarity measure. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109287. (in English)
Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cyb 9(8):1335–1349. https://doi.org/10.1007/s13042-017-0648-x. (in English)
Lv L, Wang JY, Wu RX, Wang H, Lee I (2021) Density peaks clustering based on geodetic distance and dynamic neighbourhood. Int J Bio-Inspir Com 17(1):24–33 (in English)
Ding SF, Du W, Xu X, Shi TH, Wang YR, Li C (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276. https://doi.org/10.1016/j.ins.2022.12.078. (in English)
Lin JL, Kuo JC, Chuang HW (2020) Improving density peak clustering by automatic peak selection and single linkage clustering. Symmetry-Basel. https://doi.org/10.3390/sym12071168. (in English)
Guan JY, Li S, He XX, Chen JJ (2023) Clustering by fast detection of main density peaks within a peak digraph. Inf Sci 628:504–521. https://doi.org/10.1016/j.ins.2023.01.144. (in English)
Tong WN, Liu S, Gao XZ (2021) A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing 458:655–666. https://doi.org/10.1016/j.neucom.2020.03.125. (in English)
Guo WJ, Wang WH, Zhao SP, Niu YL, Zhang ZY, Liu XG (2022) Density Peak Clustering with connectivity estimation. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2022.108501. (in English)
Guan JY, Li S, He XX, Zhu JH, Chen JJ (2021) "Fast hierarchical clustering of local density peaks via an association degree transfer method. Neurocomputing 455:401–418. https://doi.org/10.1016/j.neucom.2021.05.071. (in English)
Acknowledgements
This work is supported in part by the National Science Foundation of China (No. 61876101, 61806114) and in part by the Natural Science Foundation of Shandong Province, China, under Grant (ZR2023MF079).
Funding
This work was supported by National Natural Science Foundation of China (No.61806114, 61876101) and China Postdoctoral Science Foundation (No.2018M642695, 2019T120607).
Author information
Authors and Affiliations
Contributions
JX helped in conceptualization, methodology, writing—original draft. WZ was involved in conceptualization ideas, supervision, writing—review and editing. YZ validated the study. XL performed reviewing and funding acquisition.
Corresponding author
Ethics declarations
Ethical approval
Not applicable.
Conflict of interest
There are no competing interests.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Xiong, J., Zang, W., Zhao, Y. et al. Density peaks clustering algorithm with connected local density and punished relative distance. J Supercomput 80, 6140–6168 (2024). https://doi.org/10.1007/s11227-023-05688-0
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-023-05688-0