Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3291801.3291825acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicbdrConference Proceedingsconference-collections
research-article

A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation

Published: 27 October 2018 Publication History

Abstract

The k-means algorithm is a typical clustering algorithm based on partition. The k-means++ algorithm is a high-quality clustering algorithm, and it is used to solve the problem that the traditional k-means algorithm is sensitive to initial centers. However, the original k-means++ algorithm is sensitive to outliers and needs to manually set the number of clusters. We propose an improved k-means++ clustering algorithm that automatically determine the number of clusters based on coefficient of variation, named CV-means++. Firstly, we propose a method to confirm initial centers by using density index of data points to avoid selection of abnormal data. Secondly, we introduce the concept of coefficient of variation, and calculate the relationship between the average intra-cluster coefficient of variation and the smallest inter-cluster coefficient of variation of k+(k+ > k) clusters to determine whether the number of clusters is optimal. Experiments performed on the UCI datasets demonstrate effectiveness of the algorithm.

References

[1]
Chien-Liang Liu, Wen-Hoar Hsaio, and Tao-Hsing Chang. 2018. Locality sensitive k-means clustering. J. Inf. Sci. Eng. 34, 1 (2018), 289--305.
[2]
Shyr-Shen Yu, Shao-Wei Chu, Chuin-Mu Wang, Yung-Kuan Chan, and Ting-Cheng Chang. 2018. Two improved k-means algorithms. Appl. Soft Comput. 68 (2018), 747--755.
[3]
Wanlei Zhao, Cheng-Hao Deng, and Chong-Wah Ngo. 2018. K-means: A revisit. Neurocomputing. 291 (2018), 195--206.
[4]
Sajidha Syed Azimuddin and Kalyani Desikan. 2017. A simple density with distance based initial seed selection technique for k means algorithm. CIT. 25, 4 (2017), 291--300.
[5]
Jian-Peng Qi, Yan-Wei Yu, Li-Hong Wang, Jing-Lei Liu, and Ying-Jie Wang. 2017. An effective and efficient hierarchical k-means clustering algorithm. IJDSN. 13, 8 (2017).
[6]
David Arthur and Sergei Vassilvitskii. 2007. K-means++: the advantages of careful seeding. SODA, 1027--1035.
[7]
Dunn, J.C. 1974. Well-separated clusters and optimal fuzzy partitions. Journal of cybernetics. 4, 1 (1974), 95--104.
[8]
Duda, R.O. and P.E. Hart. 1973. Pattern classification and scene analysis. A Wiley Interscience Publication, John Wiley and Sons, Inc, 1973.
[9]
Rousseeuw, P.J., and Silhouettes. 1987. A graphical aid to the interpretation and validation of cluster analysis. Journal of Computational and Applied Mathematics. 20, 0 (1987), 53--65.
[10]
Yong-Sen Li, Shan-Lin Yang, Xi-Jun Ma, et al. 2006. Optimization study on k value of spatial clustering. Journal of System Simulation. 03 (2006), 573--576.
[11]
Ling-Bo Han. 2012. Research on the number of clusters in k-means algorithm. Journal of Sichuan University of Science & Engineering(Natural Science Edition). 25, 02 (2012), 77--80.
[12]
Jian Di and Xin-Yue Gou. 2018. Bisecting k-means algorithm based on k-valued selfdetermining and clustering center optimization. JCP. 13, 6 (2018), 588--595.
[13]
Li-Ming Bao and Gang Huang. 2017. A dynamic clustering algorithm of k-means based on multi-branches tree for k-values. Computer Technology and Development. 27, 06 (2017), 41--45+50.
[14]
Chayan Bala, Tripti Basu, and Abhijit Dasgupta. 2015. Automatic detection of k with suitable seed values for classic k-means algorithm using DE. ICACCI, 759--765.
[15]
Arshad Muhammad Mehar, Kenan M. Matawie, and Anthony J. Maeder. 2013. Determining an optimal value of K in k-means clustering. BIBM, 51--55.

Cited By

View all
  • (2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
  • (2020)Optimized K-Means clustering algorithm using an intelligent stable-plastic variational autoencoder with self-intrinsic cluster validation mechanismProceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications10.1145/3415088.3415125(1-11)Online publication date: 24-Sep-2020

Index Terms

  1. A Clustering Algorithm for Automatically Determining the Number of Clusters Based on Coefficient of Variation

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICBDR '18: Proceedings of the 2nd International Conference on Big Data Research
    October 2018
    221 pages
    ISBN:9781450364768
    DOI:10.1145/3291801
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    In-Cooperation

    • Shandong Univ.: Shandong University
    • University of Queensland: University of Queensland
    • Dalian Maritime University

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 27 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clustering
    2. Coefficient of variation
    3. Density index
    4. K-means++

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Funding Sources

    Conference

    ICBDR 2018

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)7
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 14 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Efficient Method for Continuous IoT Data Stream Indexing in the Fog-Cloud Computing LevelBig Data and Cognitive Computing10.3390/bdcc70201197:2(119)Online publication date: 14-Jun-2023
    • (2020)Optimized K-Means clustering algorithm using an intelligent stable-plastic variational autoencoder with self-intrinsic cluster validation mechanismProceedings of the 2nd International Conference on Intelligent and Innovative Computing Applications10.1145/3415088.3415125(1-11)Online publication date: 24-Sep-2020

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media