Abstract
To address the issue that the K-means algorithm chooses and determines the initial cluster center in a random way, which would fall into the local optimal clustering result, a way towards choosing the initial clustering center using information entropy is proposed. This proposed method divides the dataset evenly into data blocks with more than K, and then uses the entropy method to obtain the value of target function of each data block, as well as selects the centroid corresponding to the data block with the smallest value function of the first k target as the initial cluster center. By using entropy method to ensure the efficiency of the initial clustering center selection, an anomaly detection method is proposed. The result of the experiment show that this method performs better than the traditional K-means algorithm both in clustering effect and anomaly detection ability.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Hawkins, D.M.: Indentification of oufliers. Monogr. Appl. Probab. Stat. 80(2), 321–328 (1980)
Agrawal, S., Agrawal, J.: Survey on anomaly detection using data mining techniques. Proc. Comput. Sci. 60(1), 708–713 (2015)
Joseph, S.R., Hlomani, H., Letsholo, K.: Data mining algorithms: an overview. Neuroscience 12(3), 719–743 (2016)
Lee, W.: Applying data mining to intrusion detection. ACM SIGKDD Explor. Newsl. 4(2), 35–42 (2002)
Arora, P., Deepali, Varshney, S.: Analysis of k-means and k-medoids algorithm for big data. Proc. Comput. Sci. 78, 507–512 (2016)
Celebi, M.E., Kingravi, H.A., Vela, P.A.: A Comparative Study of Efficient Initialization Methods for the K-Means Clustering Algorithm. Pergamon Press Inc., Oxford (2013)
Han, Z.-J.: An adaptive k—means initialization method based on data density. Comput. Appl. Softw. 3t(2), 182–187 (2014). (in Chinese)
Zuo, J., Chen, Z.: Anomaly detection algorithm based on improved k-means clustering. Comput. Sci. 43(8), 258–261 (2016). (in Chinese)
Liang, J., Shi, Z., Li, D., et al.: Information entropy, rough entropy and knowledge granulation in incomplete information systems. Int. J. Gen Syst 35(6), 641–654 (2016)
Qian, P., Jiang, Y., Deng, Z., et al.: Cluster prototypes and fuzzy memberships jointly leveraged cross-domain maximum entropy clustering. IEEE Trans. Cybern. 46(1), 181 (2016)
Yang, Y.-M.: Improved k-means dynamic clustering algorithm based on information entropy. J. Chongqing Univ. Posts Telecommun. (Nat. Sci. Ed.) 28(2), 254–259 (2016). (in Chinese)
Har-Peled, S., Mazumdar, S.: Coresets for k-means and k-median clustering and their applications. In: Annual ACM Symposium on Theory of Computing, pp. 291–300 (2004)
Jia, G., Cheng, G., Gangahar, D.M., et al.: Traffic anomaly detection using k-means clustering 40(6), 403–410 (2012)
Cohenaddad, V., Klein, P.N., Mathieu, C.: Local search yields approximation schemes for k-means and k-median in euclidean and minor-free metrics. In: Foundations of Computer Science, pp. 353–364. IEEE (2016)
UCI Homepage. http://archive.ics.uci.edu/ml/datasets.html. Accessed 07 May 2018
Acknowledgements
This paper is supported in part by the National Natural Science Foundation of China under Grant No. 61672022, Key Disciplines of Computer Science and Technology of Shanghai Polytechnic University under Grant No. XXKZD1604, and the Graduate Innovation Program No. A01GY17F022.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2019 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Tan, W., Fang, X., Zhao, L., Tang, A. (2019). Anomaly Detection Algorithm Based on Cluster of Entropy. In: Sun, Y., Lu, T., Xie, X., Gao, L., Fan, H. (eds) Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2018. Communications in Computer and Information Science, vol 917. Springer, Singapore. https://doi.org/10.1007/978-981-13-3044-5_26
Download citation
DOI: https://doi.org/10.1007/978-981-13-3044-5_26
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-13-3043-8
Online ISBN: 978-981-13-3044-5
eBook Packages: Computer ScienceComputer Science (R0)