Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A robust clustering method with noise identification based on directed K-nearest neighbor graph

Published: 07 October 2022 Publication History

Abstract

Obtaining the optimal cluster number and generating reliable clustering results in nonlinear manifolds are necessary but challenging tasks. Most existing clustering algorithms have considerable limitations in dealing with local and nonlinear data patterns, while graph-based clustering has shown impressive performance in identifying clusters in such data patterns. In this paper, we propose a robust clustering method with noise cutting based on directed k-nearest neighbor graph (CDKNN) to identify the desired cluster number automatically and produce reliable clustering results simultaneously on nonlinear, non-overlapping but locally tight-connected data patterns. This method draws support from the k-nearest neighbor graph to represent the complex nonlinear datasets and applies parameter adaptive process to make the proposed clustering method better adapt to specific data patterns. The proposed method is robust to the noises of arbitrary shape datasets because it uses a directed K-nearest neighbor to cut out sparse nodes. We use simulation and UCI real-world datasets to prove the validity of the innovatory method by comparing it to k-means, DBSCAN, OPTICS, AP, SC, and CutPC algorithms in terms of clustering ACC, ARI, NMI, and FMI. The experimental results confirm that the proposed method outperforms the alternative nonlinear clustering methods.

References

[1]
J. Vargas Muñoz, M.A. Gonçalves, Z. Dias, R. da S. Torres, Hierarchical clustering-based graphs for large scale approximate nearest neighbor search, Pattern Recogn. 96 (2019),. url:https://www.sciencedirect.com/science/article/pii/S0031320319302730.
[2]
S. Horng, M. Su, Y. Chen, T. Kao, R. Chen, J. Lai, C.D. Perkasa, A novel intrusion detection system based on hierarchical clustering and support vector machines, Expert Syst. Appl. 38 (1) (2011) 306–313,.
[3]
P. Mok, H. Huang, Y. Kwok, J. Au, A robust adaptive clustering analysis method for automatic identification of clusters, Pattern Recogn. 45 (8) (2012) 3017–3033,. url:  https://www.sciencedirect.com/science/article/pii/S0031320312000647.
[4]
J. MacQueen, Some methods for classification and analysis of multivariate observations, in: Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Univ. California Press, Berkeley, Calif., 1967, pp. Vol. I: Statistics, pp. 281–297.
[5]
S. Xia, D. Peng, D. Meng, C. Zhang, G. Wang, E. Giem, W. Wei, Z. Chen, Ball k-means: Fast adaptive clustering with no bounds, IEEE Trans. Pattern Anal. Mach. Intell. 44 (1) (2022) 87–99,.
[6]
Y. Zhu, K.M. Ting, M.J. Carman, Density-ratio based clustering for discovering clusters with varying densities, Pattern Recogn. 60 (2016) 983–997,. url:  https://www.sciencedirect.com/science/article/pii/S0031320316301571.
[7]
M. Ester, H. Kriegel, J. Sander, X. Xu, A density-based algorithm for discovering clusters in large spatial databases with noise, in: E. Simoudis, J. Han, U.M. Fayyad (Eds.), Proceedings of the Second International Conference on Knowledge Discovery and Data Mining (KDD-96), AAAI Press, Portland, Oregon, USA, 1996, pp. 226–231.
[8]
M. Ankerst, M.M. Breunig, H. Kriegel, J. Sander, OPTICS: ordering points to identify the clustering structure: SIGMOD 1999, in: A. Delis, C. Faloutsos, S. Ghandeharizadeh (Eds.), Proceedings ACM SIGMOD International Conference on Management of Data, ACM Press, Philadelphia, Pennsylvania, USA, 1999, pp. 49–60,.
[9]
A. Rodriguez, A. Laio, Machine learning. clustering by fast search and find of density peaks, Science 344 (6191) (2014) 1492–6.
[10]
B.J. Frey, D. Dueck, Clustering by passing messages between data points, Science (New York, N.Y.) 315 (5814) (2007) 972–976.
[11]
C.-D. Wang, J.-H. Lai, C.Y. Suen, J.-Y. Zhu, Multi-exemplar affinity propagation, IEEE Trans. Pattern Anal. Mach. Intell. 35 (9) (2013) 2223–2237,.
[12]
F. Shang, L. Jiao, J. Shi, F. Wang, M. Gong, Fast affinity propagation clustering: A multilevel approach, Pattern Recogn. 45 (1) (2012) 474–486,. url:  https://www.sciencedirect.com/science/article/pii/S0031320311002007.
[13]
M. Liu, X. Jiang, A.C. Kot, A multi-prototype clustering algorithm, Pattern Recogn. 42 (5) (2009) 689–698,. url:  https://www.sciencedirect.com/science/article/pii/S0031320308003798.
[14]
A.Y. Ng, M.I. Jordan, Y. Weiss, On spectral clustering: Analysis and an algorithm, in: Proceedings of the 14th International Conference on Neural Information Processing Systems: Natural and Synthetic, MIT Press, Cambridge, MA, USA, 2001, pp. 849–856.
[15]
Y. Qin, Z.L. Yu, C.-D. Wang, Z. Gu, Y. Li, A novel clustering method based on hybrid k-nearest-neighbor graph, Pattern Recogn. 74 (2018) 1–14,. url:  https://www.sciencedirect.com/science/article/pii/S0031320317303497.
[16]
L.-T. Li, Z.-Y. Xiong, Q.-Z. Dai, Y.-F. Zha, Y.-F. Zhang, J.-P. Dan, A novel graph-based clustering method using noise cutting, Inform. Syst. 91 (2020),. url:  https://www.sciencedirect.com/science/article/pii/S0306437920300156.
[17]
Y. Kim, H. Do, S.B. Kim, Outer-points shaver: Robust graph-based clustering via node cutting, Pattern Recogn. 97 (2020),. url:  https://www.sciencedirect.com/science/article/pii/S0031320319303048.
[18]
D. Yan, Y. Wang, J. Wang, H. Wang, Z. Li, K-nearest neighbor search by random projection forests, IEEE Trans. Big Data 7 (1) (2021) 147–157,.
[19]
S. Stevens, Mathematics, measurement and psychophysics, Handbook of Experimental Psychology.
[20]
Q. Zhu, J. Feng, J. Huang, Natural neighbor: A self-adaptive neighborhood method without parameter k, Pattern Recogn. Lett. 80 (2016) 30–36,. url:  https://www.sciencedirect.com/science/article/pii/S016786551630085X.
[21]
R. Tarjan, Depth-first search and linear graph algorithms, in, in: 12th Annual Symposium on Switching and Automata Theory (swat 1971), 1971, pp. 114–121,.
[22]
B. Schölkopf, J. Platt, T. Hofmann, A Local Learning Approach for Clustering (2007) 1529–1536.
[23]
L. McInnes, J. Healy, Accelerated hierarchical density based clustering, IEEE International Conference on Data Mining Workshops (ICDMW) 2017 (2017) 33–42,.
[24]
N.X. Vinh, J. Epps, J. Bailey, Information theoretic measures for clusterings comparison: Variants, properties, normalization and correction for chance, J. Mach. Learn. Res. 11 (2010) 2837–2854.
[25]
D.F. Andrews, Special multivariate issue, plots of high-dimensional data, Biometrics 28 (1) (1972) 125–136.

Cited By

View all
  • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023

Index Terms

  1. A robust clustering method with noise identification based on directed K-nearest neighbor graph
        Index terms have been assigned to the content through auto-classification.

        Recommendations

        Comments

        Information & Contributors

        Information

        Published In

        cover image Neurocomputing
        Neurocomputing  Volume 508, Issue C
        Oct 2022
        338 pages

        Publisher

        Elsevier Science Publishers B. V.

        Netherlands

        Publication History

        Published: 07 October 2022

        Author Tags

        1. Directed k-nearest neighbor graph
        2. Graph-based clustering
        3. Unsupervised learning
        4. Nonlinear dataset
        5. Noise cutting

        Qualifiers

        • Research-article

        Contributors

        Other Metrics

        Bibliometrics & Citations

        Bibliometrics

        Article Metrics

        • Downloads (Last 12 months)0
        • Downloads (Last 6 weeks)0
        Reflects downloads up to 28 Dec 2024

        Other Metrics

        Citations

        Cited By

        View all
        • (2023)Parallel Strong Connectivity Based on Faster ReachabilityProceedings of the ACM on Management of Data10.1145/35892591:2(1-29)Online publication date: 20-Jun-2023

        View Options

        View options

        Media

        Figures

        Other

        Tables

        Share

        Share

        Share this Publication link

        Share on social media