Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3501409.3501648acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

Outlier Detection Method based on Improved K-means Clustering Algorithm

Published: 31 December 2021 Publication History

Abstract

Outlier detection is one of the important methods of data mining and analysis. The previous K-means outlier detection methods are easy to be misled by outliers in the clustering process and cannot detect outliers well. To solve this problem, we propose a method of outlier detection based on an improved K-means clustering algorithm. This method adaptively detects edge points (including outliers and edge points of other clusters) through the local threshold obtained from the distance distribution of the data points in the cluster to the centroid. The edge points are ignored when the centroid is updated to avoid the situation that the edge points mislead the centroid update. After completing the clustering, the farthest point in each cluster is put into the outlier cluster iteratively. When the average distance of the data in the cluster reaches stability or reaches the maximum number of iterations, the iteration ends. This method retains the ease of use of traditional K-means which accelerates the speed of clustering iteration and finds outliers accurately in data sets. Experimental results show that the algorithm can effectively detect outliers and have good results on multiple data sets.

References

[1]
Zhihua Zhou. Machine learning [M]. Tsinghua University Press, 2016, 198--204.
[2]
S. Ahmadian, A. Norouzi-Fard, O. Svensson, J. Ward, Better guarantees for k-means and Euclidean k-median by primal-dual algorithms[J]. SIAM Journal on Computing, 2020, 49(4): FOCS17-97-FOCS17-156.
[3]
Makarychev K, Shan L. Near-optimal Algorithms for Explainable k-Medians and k-Means[C]. International Conference on Machine Learning. PMLR, 2021: 7358--7367.
[4]
Zhang Z, Feng Q, Huang J, et al. A local search algorithm for k-means with outliers[J]. Neurocomputing, 2021, 450: 230--241.
[5]
Bortoloti, F. D., Oliveira, E. de, & Ciarelli, P. M. Supervised kernel density estimation K-means[J]. Expert Systems with Applications, 2020, 114350.
[6]
Arthur D, Vassilvitskii S. k-means++: The advantages of careful seeding[C]. In Proceedings of the eighteenth annual ACM-SIAM Symposium on Discrete Algorithms, 2007, 1027--1035.
[7]
B. Bahmani, B. Moseley, A. Vattani, and R. Kumar. Scalable k-means++[J]. Proceedings of the VLDB Endowment, 2012, 622--633.
[8]
P. O. Olukanmi, B. Twala. K-means-sharp: Modified centroid update for outlier-robust k-means clustering[C]. 2017 Pattern Recognition Association of South Africa and Robotics and Mechatronics (PRASA-RobMech), 2017, 14--19.
[9]
M. Brito, E. Chavez, A. Quiroz, J. Yukich. Connectivity of the mutual k-nearest-neighbor graph in clustering and outlier detection[J]. Statistics & Probability Letters, 1997, 35(1):33--42.
[10]
M. Breunig, H. Kriegel, R. Ng, J. Sander. LOF: identifying density-based local outliers[J]. In ACM sigmod record, 2000, 29(2):93--104.
[11]
He Z., Xu X., Deng S. Discovering cluster-based local outliers[J]. Pattern Recognition Letters, 2003, 24(9-10): 1641--1650.
[12]
Zhao Y, Nasrullah Z, Hryniewicki M K, et al. LSCP: Locally selective combination in parallel outlier ensembles[C]. Proceedings of the 2019 SIAM International Conference on Data Mining. Society for Industrial and Applied Mathematics, 2019, 585--593.
[13]
Z. Li, Y. Zhao, N. Botta, et al. COPOD: Copula-Based Outlier Detection[C]. 2020 IEEE International Conference on Data Mining (ICDM), 2020, 1118--1123.
[14]
Miao Fan, Chao Li. Python machine learning and practice [M]. Tsinghua University Press, 2016, 88--97.

Cited By

View all
  • (2024)Impact of Varying Distance-Based Fingerprint Similarity Metrics on Affinity Propagation Clustering Performance in Received Signal Strength-Based Fingerprint DatabasesIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34498165(1005-1014)Online publication date: 2024
  • (2024)Outlier Detection Performance of a Modified Z-Score Method in Time-Series RSS Observation With Hybrid Scale EstimatorsIEEE Access10.1109/ACCESS.2024.335673112(12785-12796)Online publication date: 2024
  • (2022)Distributed Nearest Neighbor-based outlier identification technique for WSNs2022 International Conference on Artificial Intelligence and Data Engineering (AIDE)10.1109/AIDE57180.2022.10060395(137-142)Online publication date: 22-Dec-2022

Index Terms

  1. Outlier Detection Method based on Improved K-means Clustering Algorithm

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    EITCE '21: Proceedings of the 2021 5th International Conference on Electronic Information Technology and Computer Engineering
    October 2021
    1723 pages
    ISBN:9781450384322
    DOI:10.1145/3501409
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 31 December 2021

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Clustering algorithm
    2. K-means
    3. Machine learning
    4. Outlier detection

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    EITCE 2021

    Acceptance Rates

    EITCE '21 Paper Acceptance Rate 294 of 531 submissions, 55%;
    Overall Acceptance Rate 508 of 972 submissions, 52%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)55
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Impact of Varying Distance-Based Fingerprint Similarity Metrics on Affinity Propagation Clustering Performance in Received Signal Strength-Based Fingerprint DatabasesIEEE Open Journal of Signal Processing10.1109/OJSP.2024.34498165(1005-1014)Online publication date: 2024
    • (2024)Outlier Detection Performance of a Modified Z-Score Method in Time-Series RSS Observation With Hybrid Scale EstimatorsIEEE Access10.1109/ACCESS.2024.335673112(12785-12796)Online publication date: 2024
    • (2022)Distributed Nearest Neighbor-based outlier identification technique for WSNs2022 International Conference on Artificial Intelligence and Data Engineering (AIDE)10.1109/AIDE57180.2022.10060395(137-142)Online publication date: 22-Dec-2022

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media