research-article

A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation

Authors:

Kun ZhangAuthors Info & Claims

ICBDR '18: Proceedings of the 2nd International Conference on Big Data Research

Pages 100 - 106

https://doi.org/10.1145/3291801.3291825

Published: 27 October 2018 Publication History

Abstract

The k-means algorithm is a typical clustering algorithm based on partition. The k-means++ algorithm is a high-quality clustering algorithm, and it is used to solve the problem that the traditional k-means algorithm is sensitive to initial centers. However, the original k-means++ algorithm is sensitive to outliers and needs to manually set the number of clusters. We propose an improved k-means++ clustering algorithm that automatically determine the number of clusters based on coefficient of variation, named CV-means++. Firstly, we propose a method to confirm initial centers by using density index of data points to avoid selection of abnormal data. Secondly, we introduce the concept of coefficient of variation, and calculate the relationship between the average intra-cluster coefficient of variation and the smallest inter-cluster coefficient of variation of k+(k+ > k) clusters to determine whether the number of clusters is optimal. Experiments performed on the UCI datasets demonstrate effectiveness of the algorithm.

References

[1]

Chien-Liang Liu, Wen-Hoar Hsaio, and Tao-Hsing Chang. 2018. Locality sensitive k-means clustering. J. Inf. Sci. Eng. 34, 1 (2018), 289--305.

[2]

Shyr-Shen Yu, Shao-Wei Chu, Chuin-Mu Wang, Yung-Kuan Chan, and Ting-Cheng Chang. 2018. Two improved k-means algorithms. Appl. Soft Comput. 68 (2018), 747--755.

Digital Library

[3]

Wanlei Zhao, Cheng-Hao Deng, and Chong-Wah Ngo. 2018. K-means: A revisit. Neurocomputing. 291 (2018), 195--206.

[4]

Sajidha Syed Azimuddin and Kalyani Desikan. 2017. A simple density with distance based initial seed selection technique for k means algorithm. CIT. 25, 4 (2017), 291--300.

[5]

Jian-Peng Qi, Yan-Wei Yu, Li-Hong Wang, Jing-Lei Liu, and Ying-Jie Wang. 2017. An effective and efficient hierarchical k-means clustering algorithm. IJDSN. 13, 8 (2017).

[6]

David Arthur and Sergei Vassilvitskii. 2007. K-means++: the advantages of careful seeding. SODA, 1027--1035.

Digital Library

[7]

Dunn, J.C. 1974. Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics. 4, 1 (1974), 95--104.

[8]

Duda, R.O. and P.E. Hart. 1973. Pattern classification and scene analysis. A Wiley Interscience Publication, John Wiley and Sons, Inc, 1973.

[9]

Rousseeuw, P.J., and Silhouettes. 1987. A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 20, 0 (1987), 53--65.

Digital Library

[10]

Yong-Sen Li, Shan-Lin Yang, Xi-Jun Ma, et al. 2006. Optimization study on k value of spatial clustering. Journal of System Simulation. 03 (2006), 573--576.

[11]

Ling-Bo Han. 2012. Research on the number of clusters in k-means algorithm. Journal of Sichuan University of Science & Engineering(Natural Science Edition). 25, 02 (2012), 77--80.

[12]

Jian Di and Xin-Yue Gou. 2018. Bisecting k-means algorithm based on k-valued selfdetermining and clustering center optimization. JCP. 13, 6 (2018), 588--595.

[13]

Li-Ming Bao and Gang Huang. 2017. A dynamic clustering algorithm of k-means based on multi-branches tree for k-values. Computer Technology and Development. 27, 06 (2017), 41--45+50.

[14]

Chayan Bala, Tripti Basu, and Abhijit Dasgupta. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. ICACCI, 759--765.

[15]

Arshad Muhammad Mehar, Kenan M. Matawie, and Anthony J. Maeder. 2013. Determining an optimal value of K in k-means clustering. BIBM, 51--55.

Cited By

Khettabi KKouahla ZFarou BSeridi HFerrag M(2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
https://doi.org/10.3390/bdcc7020119
Gikera RKenya NMambo SKenya NMwaura JKenya NSunjiv Soyjaudah KSameerchand PSingh U(2020)Optimized K-Means clustering algorithm using an intelligent stable-plastic variational autoencoder with self-intrinsic cluster validation mechanismProceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications10.1145/3415088.3415125(1-11)Online publication date: 24-Sep-2020
https://dl.acm.org/doi/10.1145/3415088.3415125

Index Terms

A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation
1. Computing methodologies
  1. Machine learning
    1. Learning paradigms
      1. Unsupervised learning
        Cluster analysis

Recommendations

A density-peak-based clustering algorithm of automatically determining the number of clusters
Abstract
Clustering is a typical and important method to discover new structures and knowledge from data sets. Most existing clustering methods need to know the number of clusters in advance, which is difficult. Some algorithms claim they do ...
The FRCK clustering algorithm for determining cluster number and removing outliers automatically

Clustering algorithm is one of the most popular unsupervised algorithms for data grouping. The K-means algorithm is a popular clustering algorithm for its simplicity, ease of implementation and efficiency. But for K-means algorithm, the optical cluster ...
Agglomerative Fuzzy K-Means Clustering Algorithm with Selection of Number of Clusters

In this paper, we present an agglomerative fuzzy $k$-means clustering algorithm for numerical data, an extension to the standard fuzzy $k$-means algorithm by introducing a penalty term to the objective function to make the clustering process not ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICBDR '18: Proceedings of the 2nd International Conference on Big Data Research

October 2018

221 pages

ISBN:9781450364768

DOI:10.1145/3291801

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

Shandong Univ.: Shandong University
University of Queensland: University of Queensland
Dalian Maritime University

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

ICBDR 2018

ICBDR 2018: 2018 The 2nd International Conference on Big Data Research

October 27 - 29, 2018

Weihai, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
89
Total Downloads

Downloads (Last 12 months)7
Downloads (Last 6 weeks)0

Reflects downloads up to 14 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khettabi KKouahla ZFarou BSeridi HFerrag M(2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
https://doi.org/10.3390/bdcc7020119
Gikera RKenya NMambo SKenya NMwaura JKenya NSunjiv Soyjaudah KSameerchand PSingh U(2020)Optimized K-Means clustering algorithm using an intelligent stable-plastic variational autoencoder with self-intrinsic cluster validation mechanismProceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications10.1145/3415088.3415125(1-11)Online publication date: 24-Sep-2020
https://dl.acm.org/doi/10.1145/3415088.3415125

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents