research-article

CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph

Authors:

Biao QinAuthors Info & Claims

Volume 34, Issue 5

Pages 1152 - 1166

https://doi.org/10.1007/s11390-019-1966-0

Published: 01 September 2019 Publication History

Abstract

The designation of the cluster number K and the initial centroids is essential for K-modes clustering algorithm. However, most of the improved methods based on K-modes specify the K value manually and generate the initial centroids randomly, which makes the clustering algorithm significantly dependent on human-based decisions and unstable on the iteration time. To overcome this limitation, we propose a cohesive K-modes (CK-modes) algorithm to generate the cluster number K and the initial centroids automatically. Explicitly, we construct a labeled property graph based on index-free adjacency to capture both global and local cohesion of the node in the sample of the input datasets. The cohesive node calculated based on the property similarity is exploited to split the graph to a K-node tree that determines the K value, and then the initial centroids are selected from the split subtrees. Since the property graph construction and the cohesion calculation are only performed once, they account for a small amount of execution time of the clustering operation with multiple iterations, but significantly accelerate the clustering convergence. Experimental validation in both real-world and synthetic datasets shows that the CK-modes algorithm outperforms the state-of-the-art algorithms.

References

[1]

Shiokawa H, Fujiwara Y, Onizuka M. SCAN++: Efficient algorithm for finding clusters, hubs and outliers on largescale graphs. Proceedings of the VLDB Endowment, 2015, 8(11): 1178-1189.

Digital Library

[2]

Zhang W P, Li Z J, Li R H, Liu Y H, Mao R, Qiao S J. MapReduce-based graph structural clustering algorithm. Journal of Software, 2018, 29(3): 627-641. (in Chinese)

[3]

Wu Y, Zhong Z N, Xiong W, Chen L, Jing N. An efficient method for attributed graph clustering. Chinese Journal of Computer, 2013, 36(8): 1704-1713. (in Chinese)

[4]

Guo T, Ding X W, Li Y F. Parallel K-modes algorithm based on MapReduce. In Proc. the 3rd International Conference on Digital Information, Networking, and Wireless Communications, February 2015, pp.176-179.

[5]

Zhou F F, Li J C, Huang W, Wang J H, Zhao Y. Extending dimensions in Radviz for visual clustering analysis. Journal of Software, 2016, 27(5): 1127-1139. (in Chinese)

[6]

Noori-Daryan M, Taleizadeh A A, Govindan K. Joint replenishment and pricing decisions with different freight modes considerations for a supply chain under a composite incentive contract. Journal of the Operational Research Society, 2018, 69(6): 876-894.

[7]

Huang Z X. Clustering large data sets with mixed numeric and categorical values. In Proc. the 1st Pacific-Asia Conference on Knowledge Discovery and Data Mining, February 1997, pp.21-35.

[8]

Ahmad A, Dey L. A method to compute distance between two categorical values of same attribute in unsupervised learning for categorical data set. Pattern Recognition Letters, 2007, 28(1): 110-118.

Digital Library

[9]

Park H S, Jun C H. A simple and fast algorithm for K-medoids clustering. Expert Systems with Applications, 2009, 36(2): 3336-3341.

Digital Library

[10]

Zadegan S M R, Mirzaie M, Sadoughi F. Randed K-medoids: A fast and accurate rank-based partitioning algorithm for clustering large datasets. Knowledge-Based Systems, 2013, 39: 133-143.

Digital Library

[11]

Ferrarini L, Olofsen H, Palm W M, van Buchem M A, Reiber J H C, Admiraal-Behloul F. GAMEs: Growing and adaptive meshes for fully automatic shape modeling and analysis. Medical Image Analysis. 2007, 11(3): 302-314.

[12]

Ng M K, Chan E Y, So M M C, Ching W K. A semisupervised regression model for mixed numerical and categorical variables. Pattern Recognition, 2007, 40(6): 1745-1752.

Digital Library

[13]

Bachem O, Lucic M, Hassani S H, Krause A. Approximate K-means++ in sublinear time. In Proc. the 30th AAAI Conference on Artificial Intelligence, February 2016, pp.1459-1467.

[14]

Arthur D, Vassilvitskii S. K-means++: The advantages of careful seeding. In Proc. the 18th Annual ACM-SIAM Symposium on Discrete Algorithms, January 2007, pp.1027-1035.

[15]

Liu Y C, Li Z M, Xiong H, Gao X D,Wu J J. Understanding of internal clustering validation measures. In Proc. the 10th IEEE International Conference on Data Mining, December 2010, pp.911-916.

Digital Library

[16]

Liu Y C, Li Z M, Xiong H, Gao X D,Wu J J. Understanding and enhancement of internal clustering validation measures. IEEE Transactions on Cybernetics, 2013, 43(3): 982-994.

[17]

Robinson I, Webber J, Eifrem E. Graph Databases (1st edition). O’Reilly Media, 2013.

Digital Library

[18]

Akpan N P, Iwok I A. A minimum spanning tree approach of solving a transportation problem. International Journal of Mathematics and Statistics Invention, 2017, 5(3): 9-18.

[19]

Li M C, Han S, Shi J. An enhanced ISODATA algorithm for recognizing multiple electric appliances from the aggregated power consumption dataset. Energy and Buildings, 2017, (140): 305-316.

[20]

Hesthaven J S. A stable penalty method for the compressible Navier-Stokes equations: II. One-dimensional domain decomposition schemes. SIAM Journal on Scientific Computing, 1997, 18(3): 658-685.

Digital Library

[21]

Jin X, Han J. K-medoids clustering. In Encyclopedia of Machine Learning, Sammut G, Webb G I (eds.), Springer, 2016, pp.564-565.

[22]

Han L S, Xiang L S, Liu X Y, Luan J. The K-medoids algorithm with initial centers optimized based on a P System. Journal of Information and Computational Science, 2014, 11(6): 1765-1773.

[23]

Kang Z, Peng C, Cheng Q. Clustering with adaptive manifold structure learning. In Proc. the 33rd Int. Conference on Data Engineering, Apr. 2017, pp.79-82.

[24]

Nehak D, Dehak R, Glass J, Reynolds D, Kenny P. Cosine similarity scoring without score normalization techniques. In Proc. the Speaker and Language Recognition Workshop, June 2010, Article No. 15.

[25]

Cheng H, Zhou Y, Yu J X. Clustering large attributed graphs: A balance between structural and attribute similarities. ACM Transactions on Knowledge Discovery from Data, 2011, 5(2): Article No. 12.

[26]

Chang L J, Li W, Lu Q, Zhang W J, Yang S Y. pSCAN: Fast and exact structural graph clustering. IEEE Transactions on Knowledge and Data Engineering, 2017, 29(2): 387-401.

Digital Library

[27]

Schubert E, Sander J, Ester M, Kriegel H P, Xu X W. DBSCAN revisited, revisited: Why and how you should (still) use DBSCAN. ACM Transactions on Database Systems, 2017, 42(3): Article No. 19.

Digital Library

[28]

Du Z H, Li Y B. An improved BIRCH clustering algorithm and application in thermal power. In Proc. the 2010 International Conference on Web Information Systems and Mining, October 2010, pp.53-56.

[29]

Xiong H, Wu J J, Chen J. K-means clustering versus validation measures: A data-distribution perspective. IEEE Transactions on Systems, Man, and Cybernetics, Part B, 2009, 39(2): 318-331.

Digital Library

[30]

Wu J J, Xiong H, Chen J. Adapting the right measures for K-means clustering. In Proc. the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, June 2009, pp.877-886.

Digital Library

Cited By

Qie HDou YHuang ZXiong Y(2023)Isolate Sets Based Parallel Louvain Method for Community DetectionJournal of Computer Science and Technology10.1007/s11390-023-1599-138:2(373-390)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1007/s11390-023-1599-1

Index Terms

CK-Modes Clustering Algorithm Based on Node Cohesion in Labeled Property Graph
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis
2. Information systems
  1. Information retrieval
    1. Retrieval tasks and goals
      1. Clustering and classification
  2. Information systems applications
    1. Data mining
      1. Clustering

Index terms have been assigned to the content through auto-classification.

Recommendations

Two‐phase clustering algorithm with density exploring distance measure

Here, the authors propose a novel two‐phase clustering algorithm with a density exploring distance (DED) measure. In the first phase, the fast global K ‐means clustering algorithm is used to obtain the cluster number and the prototypes. Then, the ...
k'-Means algorithms for clustering analysis with frequency sensitive discrepancy metrics

This paper proposes a new kind of k^'-means algorithms for clustering analysis with three frequency sensitive (data) discrepancy metrics in the cases that the exact number of clusters in a dataset is not pre-known. That is, by setting the number k of ...
Initialization of K-modes clustering using outlier detection techniques

We considered the initialization of K-modes clustering from the view of outlier detection.We proposed an initialization algorithm for K-modes clustering via the distance-based outlier detection technique.We presented a partition entropy-based outlier ...

Comments

Information & Contributors

Information

Published In

cover image Journal of Computer Science and Technology

Journal of Computer Science and Technology Volume 34, Issue 5

Sep 2019

228 pages

ISSN:1000-9000

Issue’s Table of Contents

Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 September 2019

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 25 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Qie HDou YHuang ZXiong Y(2023)Isolate Sets Based Parallel Louvain Method for Community DetectionJournal of Computer Science and Technology10.1007/s11390-023-1599-138:2(373-390)Online publication date: 1-Apr-2023
https://dl.acm.org/doi/10.1007/s11390-023-1599-1

View Options

View options

Figures

Tables

Media

View Issue’s Table of Contents