Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Density peaks clustering algorithm with connected local density and punished relative distance

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

Density peaks clustering (DPC) algorithm has been widely applied in many fields due to its innovation and efficiency. However, the original DPC algorithm and many of its variants choose Euclidean distance as local density and relative distance estimations, which affects the clustering performance on some specific shaped datasets, such as manifold datasets. To address the above-mentioned issue, we propose a density peak clustering algorithm with connected local density and punished relative distance (DPC–CLD–PRD). Specifically, the proposed approach computes the distance matrix between data pairs using the flexible connectivity distance metric. Then, it calculates the connected local density of each data point via combining the flexible connectivity distance measure and k-nearest neighbor method. Finally, the punished relative distance of each data point is obtained by introducing a connectivity estimation strategy into the distance optimization process. Experiments on synthetic, real-world, and image datasets have shown the effectiveness of the algorithm in this paper.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19

Similar content being viewed by others

Availability of data and materials

Data and materials will be made available on reasonable request.

References

  1. Flores KG, Garza SE (2020) Density peaks clustering with gap-based automatic center detection. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2020.106350. (in English)

    Article  Google Scholar 

  2. Pujari AK, Rajesh K, Reddy DS (2021) Clustering techniques in data mining—a survey. Iete J Res 47(1–2):19–28. https://doi.org/10.1080/03772063.2001.11416199. (in English)

    Article  Google Scholar 

  3. Pastuchova E, Vaclavikova S (2013) Cluster analysis—data mining technique for discovering natural groupings in the data. J Electr Eng-Slovak 64(2):128–131. https://doi.org/10.2478/jee-2013-0019. (in English)

    Article  Google Scholar 

  4. Gao K, Khan HA, Qu WW (2022) Clustering with missing features: a density-based approach. Symmetry-Basel. https://doi.org/10.3390/sym14010060. (in English)

    Article  Google Scholar 

  5. Liu HF, Li J, Wu Y, Fu Y (2021) Clustering with outlier removal. IEEE Trans Knowl Data Eng 33(6):2369–2379. https://doi.org/10.1109/Tkde.2019.2954317. (in English)

    Article  Google Scholar 

  6. Xu X, Ding SF, Wang YR, Wang LJ, Jia WK (2021) A fast density peaks clustering algorithm with sparse search. Inf Sci 554:61–83. https://doi.org/10.1016/j.ins.2020.11.050. (in English)

    Article  MathSciNet  Google Scholar 

  7. Cheng DD, Zhu QS, Huang JL, Wu QW, Yang LJ (2021) Clustering with local density peaks-based minimum spanning tree. IEEE Trans Knowl Data Eng 33(2):374–387. https://doi.org/10.1109/Tkde.2019.2930056. (in English)

    Article  Google Scholar 

  8. Raissi M, Perdikaris P, Karniadakis GE (2017) Machine learning of linear differential equations using Gaussian processes. J Comput Phys 348:683–693. https://doi.org/10.1016/j.jcp.2017.07.050. (in English)

    Article  ADS  MathSciNet  Google Scholar 

  9. Tsai CF, Lin WC, Hu YH, Yao GT (2019) Under-sampling class imbalanced datasets by combining clustering analysis and instance selection. Inf Sci 477:47–54. https://doi.org/10.1016/j.ins.2018.10.029. (in English)

    Article  ADS  Google Scholar 

  10. Fahad A et al (2014) A survey of clustering algorithms for big data: taxonomy and empirical analysis. IEEE Trans Emerg Top Comput 2(3):267–279. https://doi.org/10.1109/Tetc.2014.2330519. (in English)

    Article  Google Scholar 

  11. Zhao QH, Li XL, Li Y, Zhao XM (2017) A fuzzy clustering image segmentation algorithm based on Hidden Markov Random Field models and Voronoi Tessellation. Pattern Recogn Lett 85:49–55. https://doi.org/10.1016/j.patrec.2016.11.019. (in English)

    Article  ADS  CAS  Google Scholar 

  12. Choy SK, Lam SY, Yu KW, Lee WY, Leung KT (2017) Fuzzy model-based clustering and its application in image segmentation. Pattern Recogn 68:141–157. https://doi.org/10.1016/j.patcog.2017.03.009. (in English)

    Article  ADS  Google Scholar 

  13. Hou J, Liu WX, Xu E, Cui HX (2016) Towards parameter-independent data clustering and image segmentation. Pattern Recogn 60:25–36. https://doi.org/10.1016/j.patcog.2016.04.015. (in English)

    Article  ADS  Google Scholar 

  14. Wang H et al (2017) Pattern recognition and classification of two cancer cell lines by diffraction imaging at multiple pixel distances. Pattern Recogn 61:234–244. https://doi.org/10.1016/j.patcog.2016.07.035. (in English)

    Article  ADS  Google Scholar 

  15. Nilashi M, Bagherifard K, Rahmani M, Rafe V (2017) A recommender system for tourism industry using cluster ensemble and prediction machine learning techniques. Comput Ind Eng 109:357–368. https://doi.org/10.1016/j.cie.2017.05.016. (in English)

    Article  Google Scholar 

  16. Guo GB, Zhang J, Yorke-Smith N (2015) Leveraging multiviews of trust and similarity to enhance clustering-based recommender systems. Knowl-Based Syst 74:14–27. https://doi.org/10.1016/j.knosys.2014.10.016. (in English)

    Article  Google Scholar 

  17. Zahra S, Ghazanfar MA, Khalid A, Azam MA, Naeem U, Prugel-Bennett A (2015) Novel centroid selection approaches for K means-clustering based recommender systems. Inf Sci 320:156–189. https://doi.org/10.1016/j.ins.2015.03.062. (in English)

    Article  Google Scholar 

  18. Ester M, Kriegel HP, Sander J, Xu X (1996) A density-based algorithm for discovering clusters in large spatial databases with noise, AAAI Press

  19. Rodriguez A, Laio A (2014) Clustering by fast search and find of density peaks. Science 344(6191):1492–1496. https://doi.org/10.1126/science.1242072. (in English)

    Article  ADS  CAS  PubMed  Google Scholar 

  20. Macqueen J (1967) Some methods for classification and analysis of multivariate observations, In: Proc Symp Math Statist and Probability, 5th, vol. 1

  21. Mazzeo GM, Masciari E, Zaniolo C (2017) A fast and accurate algorithm for unsupervised clustering around centroids. Inf Sci 400:63–90. https://doi.org/10.1016/j.ins.2017.03.002. (in English)

    Article  Google Scholar 

  22. Lei T, Jia XH, Zhang YN, He LF, Meng HY, Nandi AK (2018) Significantly fast and robust fuzzy c-means clustering algorithm based on morphological reconstruction and membership filtering. IEEE Trans Fuzzy Syst 26(5):3027–3041. https://doi.org/10.1109/Tfuzz.2018.2796074. (in English)

    Article  Google Scholar 

  23. Johnson S, Hierarchical clustering schemes, Psychometrika

  24. Tian Z, Ramakrishnan R, Livny M (1996) BIRCH: an efficient data clustering method for very large. ACM SIGMOD Rec 25(2):103–114

    Article  Google Scholar 

  25. Zhao J, Tang JJ, Fan TH, Li CM, Xu LZ (2020) Density peaks clustering based on circular partition and grid similarity. Concurr Comp-Pract E. https://doi.org/10.1002/cpe.5567. (in English)

    Article  Google Scholar 

  26. Yue SH, Wang JS, Wu T, Wang HX (2010) A new separation measure for improving the effectiveness of validity indices. Inf Sci 180(5):748–764. https://doi.org/10.1016/j.ins.2009.11.005. (in English)

    Article  MathSciNet  Google Scholar 

  27. Chen T, Zhang NL, Liu TF, Poon KM, Wang Y (2012) Model-based multidimensional clustering of categorical data. Artif Intell 176(1):2246–2269. https://doi.org/10.1016/j.artint.2011.09.003. (in English)

    Article  MathSciNet  Google Scholar 

  28. Yang MS, Chang-Chien SJ, Nataliani Y (2019) Unsupervised fuzzy model-based Gaussian clustering. Inf Sci 481:1–23. https://doi.org/10.1016/j.ins.2018.12.059. (in English)

    Article  MathSciNet  Google Scholar 

  29. Selvi C, Sivasankar E (2019) A novel optimization algorithm for recommender system using modified fuzzy c-means clustering approach. Soft Comput 23(6):1901–1916. https://doi.org/10.1007/s00500-017-2899-6. (in English)

    Article  Google Scholar 

  30. Peng Y, Zhu Q, Huang B (2011) Spectral clustering with density sensitive similarity function. Knowl-Based Syst 24(5):621–628

    Article  Google Scholar 

  31. Wang Y, Jiang Y, Wu Y, Zhou ZH (2011) Spectral clustering on multiple manifolds. IEEE Trans Neural Netw 22(7):1149–1161. https://doi.org/10.1109/Tnn.2011.2147798. (in English)

    Article  PubMed  Google Scholar 

  32. Tao XM et al (2021) Density peak clustering using global and local consistency adjustable manifold distance. Inf Sci 577:769–804. https://doi.org/10.1016/j.ins.2021.08.036. (in English)

    Article  MathSciNet  Google Scholar 

  33. Ikotun AM, Ezugwu AE, Abualigah L, Abuhaija B, Heming J (2023) K-means clustering algorithms: a comprehensive review, variants analysis, and advances in the era of big data. Inf Sci 622:178–210. https://doi.org/10.1016/j.ins.2022.11.139. (in English)

    Article  Google Scholar 

  34. Nie FP, Li ZH, Wang R, Li XL (2023) An effective and efficient algorithm for k-means clustering with new formulation. IEEE Trans Knowl Data Eng 35(4):3433–3443. https://doi.org/10.1109/Tkde.2022.3155450. (in English)

    Article  Google Scholar 

  35. Cheng DD, Huang JL, Zhang SL, Xia SY, Wang GY, Xie J (2023) K-means clustering with natural density peaks for discovering arbitrary-shaped clusters. IEEE Trans Neural Netw Learn. https://doi.org/10.1109/Tnnls.2023.3248064. (in English)

    Article  Google Scholar 

  36. Murtagh F, Legendre P (2014) Ward's hierarchical agglomerative clustering method: which algorithms implement ward's criterion?, Springer US, no. 3

  37. Murtagh F, Contreras P (2012) Algorithms for hierarchical clustering: an overview. Wires Data Min Knowl 2(1):86–97. https://doi.org/10.1002/widm.53

    Article  Google Scholar 

  38. Kimes PK, Liu YF, Hayes DN, Marron JS (2017) Statistical significance for hierarchical clustering. Biometrics 73(3):811–821. https://doi.org/10.1111/biom.12647. (in English)

    Article  MathSciNet  PubMed  PubMed Central  Google Scholar 

  39. Karypis G, Han EH, Kumar V (1999) Chameleon: hierarchical clustering using dynamic modeling. Computer 32(8):68. https://doi.org/10.1109/2.781637. (in English)

    Article  Google Scholar 

  40. Guha S, Rastogi R, Shim K (2001) Cure: an efficient clustering algorithm for large databases. Inform Syst 26(1):35–58. https://doi.org/10.1016/S0306-4379(01)00008-4. (in English)

    Article  Google Scholar 

  41. Du MJ, Wu FY (2022) Grid-based clustering using boundary detection. Entropy-Switz. https://doi.org/10.3390/e24111606. (in English)

    Article  Google Scholar 

  42. Starczewski A, Scherer MM, Ksiazek W, Debski M, Wang LP (2021) A novel grid-based clustering algorithm. J Artif Intell Soft 11(4):319–330. https://doi.org/10.2478/jaiscr-2021-0019. (in English)

    Article  Google Scholar 

  43. Wang W, Yang J, Muntz R (1997) STING: A statistical information grid approach to spatial data mining. In VLDB'97, Proceedings of 23rd international conference on very large data bases, 25–29, Athens, Greece

  44. Tareq M, Sundararajan EA, Harwood A, Abu Bakar A (2022) A systematic review of density grid-based clustering for data streams. IEEE Access 10(579–596):2022. https://doi.org/10.1109/Access.2021.3134704

    Article  Google Scholar 

  45. Du MJ, Zhao JQ, Sun JR, Dong YQ (2022) M3W: multistep three-way clustering. IEEE Trans Neural Netw Learn. https://doi.org/10.1109/Tnnls.2022.3208418

    Article  Google Scholar 

  46. Bouveyron C, Brunet-Saumard C (2014) Model-based clustering of high-dimensional data: a review. Comput Stat Data An 71:52–78. https://doi.org/10.1016/j.csda.2012.12.008. (in English)

    Article  MathSciNet  Google Scholar 

  47. Ghahramani Z, Hinton GE (1997) The EM algorithm for mixtures of factor analyzers

  48. Asheri H, Hosseini R, Araabi BN (2021) A new EM algorithm for flexibly tied GMMs with large number of components. Pattern Recogn. https://doi.org/10.1016/j.patcog.2021.107836. (in English)

    Article  Google Scholar 

  49. Zhao J, Wang G, Pan JS, Fan TH, Lee IV (2023) Density peaks clustering algorithm based on fuzzy and weighted shared neighbor for uneven density datasets. Pattern Recogn. https://doi.org/10.1016/j.patcog.2023.109406. (in English)

    Article  Google Scholar 

  50. Ng AY, Jordan MI, Weiss Y (2002) On spectral clustering: analysis and an algorithm, Proc Nips

  51. Zhang P et al (2022) Consensus one-step multi-view subspace clustering. IEEE Trans Knowl Data Eng 34(10):4676–4689. https://doi.org/10.1109/Tkde.2020.3045770. (in English)

    Article  Google Scholar 

  52. Zhu X, Zhang S, He W, Hu R, Lei C, Zhu P (2019) One-step multi-view spectral clustering. IEEE Trans Knowl Data Eng 31(10):2022–2034

    Article  Google Scholar 

  53. Yin H, Hu W, Li F, Lou J (2021) One-step multi-view spectral clustering by learning common and specific nonnegative embeddings. Int J Mach Learn Cyb 12:2121–2134

    Article  Google Scholar 

  54. Zheng QH, Zhu JH, Li ZY, Pang SM, Wang J, Li YC (2020) Feature concatenation multi-view subspace clustering. Neurocomputing 379:89–102. https://doi.org/10.1016/j.neucom.2019.10.074. (in English)

    Article  Google Scholar 

  55. Schneider J, Vlachos M (2017) "Scalable density-based clustering with quality guarantees using random projections. Data Min Knowl Disc 31(4):972–1005. https://doi.org/10.1007/s10618-017-0498-x. (in English)

    Article  MathSciNet  Google Scholar 

  56. Zhang RL, Miao ZG, Tian Y, Wang HP (2022) A novel density peaks clustering algorithm based on Hopkins statistic. Expert Syst Appl. https://doi.org/10.1016/j.eswa.2022.116892. (in English)

    Article  PubMed  PubMed Central  Google Scholar 

  57. Ester M (2009) Density-based clustering. Springer, US

    Book  Google Scholar 

  58. Zhang QH, Dai YY, Wang GY (2023) Density peaks clustering based on balance density and connectivity. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109052. (in English)

    Article  Google Scholar 

  59. Ding SF et al (2023) A sampling-based density peaks clustering algorithm for large-scale data. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109238. (in English)

    Article  Google Scholar 

  60. Rasool Z, Aryal S, Bouadjenek MR, Dazeley R (2023) Overcoming weaknesses of density peak clustering using a data-dependent similarity measure. Pattern Recogn. https://doi.org/10.1016/j.patcog.2022.109287. (in English)

    Article  Google Scholar 

  61. Du MJ, Ding SF, Xu X, Xue Y (2018) Density peaks clustering using geodesic distances. Int J Mach Learn Cyb 9(8):1335–1349. https://doi.org/10.1007/s13042-017-0648-x. (in English)

    Article  Google Scholar 

  62. Lv L, Wang JY, Wu RX, Wang H, Lee I (2021) Density peaks clustering based on geodetic distance and dynamic neighbourhood. Int J Bio-Inspir Com 17(1):24–33 (in English)

    Article  Google Scholar 

  63. Ding SF, Du W, Xu X, Shi TH, Wang YR, Li C (2023) An improved density peaks clustering algorithm based on natural neighbor with a merging strategy. Inf Sci 624:252–276. https://doi.org/10.1016/j.ins.2022.12.078. (in English)

    Article  Google Scholar 

  64. Lin JL, Kuo JC, Chuang HW (2020) Improving density peak clustering by automatic peak selection and single linkage clustering. Symmetry-Basel. https://doi.org/10.3390/sym12071168. (in English)

    Article  Google Scholar 

  65. Guan JY, Li S, He XX, Chen JJ (2023) Clustering by fast detection of main density peaks within a peak digraph. Inf Sci 628:504–521. https://doi.org/10.1016/j.ins.2023.01.144. (in English)

    Article  Google Scholar 

  66. Tong WN, Liu S, Gao XZ (2021) A density-peak-based clustering algorithm of automatically determining the number of clusters. Neurocomputing 458:655–666. https://doi.org/10.1016/j.neucom.2020.03.125. (in English)

    Article  Google Scholar 

  67. Guo WJ, Wang WH, Zhao SP, Niu YL, Zhang ZY, Liu XG (2022) Density Peak Clustering with connectivity estimation. Knowl-Based Syst. https://doi.org/10.1016/j.knosys.2022.108501. (in English)

    Article  Google Scholar 

  68. Guan JY, Li S, He XX, Zhu JH, Chen JJ (2021) "Fast hierarchical clustering of local density peaks via an association degree transfer method. Neurocomputing 455:401–418. https://doi.org/10.1016/j.neucom.2021.05.071. (in English)

    Article  Google Scholar 

Download references

Acknowledgements

This work is supported in part by the National Science Foundation of China (No. 61876101, 61806114) and in part by the Natural Science Foundation of Shandong Province, China, under Grant (ZR2023MF079).

Funding

This work was supported by National Natural Science Foundation of China (No.61806114, 61876101) and China Postdoctoral Science Foundation (No.2018M642695, 2019T120607).

Author information

Authors and Affiliations

Authors

Contributions

JX helped in conceptualization, methodology, writing—original draft. WZ was involved in conceptualization ideas, supervision, writing—review and editing. YZ validated the study. XL performed reviewing and funding acquisition.

Corresponding author

Correspondence to Wenke Zang.

Ethics declarations

Ethical approval

Not applicable.

Conflict of interest

There are no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Xiong, J., Zang, W., Zhao, Y. et al. Density peaks clustering algorithm with connected local density and punished relative distance. J Supercomput 80, 6140–6168 (2024). https://doi.org/10.1007/s11227-023-05688-0

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-023-05688-0

Keywords