Abstract
Clustering algorithms (i.e., Gaussian mixture models, k-means) tackle the problem of grouping a set of elements in such a way that elements from the same group (or cluster) have more similar properties to each other than to those elements in other clusters. This simple concept turns out to be the basis in complex algorithms from many application areas, including sequence analysis and genotyping in bioinformatics, medical imaging, antimicrobial activity, market research, social networking, etc. However, as the data volume continues to increase, the performance of clustering algorithms is heavily influenced by the memory subsystem. In this paper, we propose a novel and efficient implementation of Lloyd’s k-means clustering algorithm to substantially reduce data movement along the memory hierarchy. Our contributions are based on the fact that the vast majority of processors are equipped with powerful Single Instruction Multiple Data (SIMD) instructions that are, in most cases, underused. SIMD improves the CPU computational power and, if used wisely, can be seen as an opportunity to improve on the application data transfers by compressing/decompressing the data, specially for memory-bound applications. Our contributions include a SIMD-friendly data layout organization, in-register implementation of key functions and SIMD-based compression. We demonstrate that using our optimized SIMD-based compression method, it is possible to improve the performance and energy of k-means by a factor of 4.5x and 8.7x, respectively, for a i7 Haswell machine, and 22x and 22.2x for Xeon Phi: KNL, running a single thread.
Similar content being viewed by others
Notes
Positron emission tomography.
Store instructions that skip the first level of the cache hierarchy.
The addition of all the data values within a vector register.
Network on Chip.
References
Arthur D, Vassilvitskii S (2007) K-means++: the advantages of careful seeding. In: Proceedings of the Annual ACM-SIAM Symposium on Discrete Algorithms, pp 1027–1035
Ayguadé D, Copty N, Duran A, Hoefinger J, Lin Y, Massaioli F, Teruel X, Unnikrishnan P, Zhang G (2009) The design of OpenMP tasks. IEEE Trans Parallel Distrib Syst 20(3):401–418
Browne S, Dongarra J, Garner N, Ho G, Mucci P (2000) A portable programming interface for performance evaluation on modern processors. Int J High Perform Comput Appl 14(3):189–204
Burks S, Harrell G, Wang J (2015) On initial effects of the K-means clustering. In: Proceedings of the International Conference on Scientific Computing, pp 200–205
Che S, Boyer M, Meng J, Tarjan D, Sheaffer JW, Skadron K (2008) A performance study of general-purpose applications on graphics processors using CUDA. J Parallel Distrib Comput 68(10):1370–1380
Cui X, Zhu P, Yang X, Li K, Ji C (2014) Optimized big data K-means clustering using MapReduce. J Supercomput 70(3):1249–1259
Hadian A, Shahrivari S (2014) High performance parallel K-means clustering for disk-resident datasets on multi-core CPUs. J Supercomput 69(2):845–863
Hamerly G (2010) Making k-means even faster. In: 2010 SIAM International Conference on Data Mining, pp 130–140
Hasib AA, Cebrián JM, Natvig L (2015) V-pfordelta: data compression for energy efficient computation of time series. In: 2015 IEEE 22nd International Conference on High Performance Computing, pp 416–425
Kanungo T, Mount DM, Netanyahu NS, Piatko CD, Silverman R, Wu AY (2002) An efficient k-means clustering algorithm: analysis and implementation. IEEE Trans Pattern Anal Mach Intell 24(7):881–892
Lemire D, Boytsov L, Kurz N (2015) SIMD compression and the intersection of sorted integers. Softw Pract Exp 46(6):723–749
Lloyd S (2006) Least squares quantization in PCM. IEEE Trans Inf Theory 28(2):129–137
Mall R (2015) Sparsity in large scale kernel models. Ph.D. thesis, Leuven Arenberg Doctoral School
Mall R, Jumutc V, Langone R, Suykens JAK (2014) Representative subsets for big data learning using K-NN graphs. In: IEEE International Conference on Big Data, pp 37–42
Mathew J, Vijayakumar R (2015) Enhancement of parallel K-means algorithm. In: Proceedings of the International Conference on Innovations in Information, Embedded and Communication Systems, pp 1–6
Mittal S, Vetter J (2015) A survey of architectural approaches for data compression in cache and main memory systems. IEEE Trans Parallel Distrib Syst 99(1):1–14
Stephens N (2016) The scalable vector extension (SVE) for the ARMv8-A architecture. https://community.arm.com/groups/processors/blog/2016/08/22/technology-update-the-scalable-vector-extension-sve-for-the-armv8-a-architecture
Northwestern University, USA (2013) Parallel K-means data clustering. http://www.ece.northwestern.edu/~wkliao/Kmeans/index.html
Ravaee H (2012) Finding protein complexes via fuzzy learning vector quantization algorithm. In: Cai W, Hong H (eds) Protein–protein interactions—computational and experimental tools. InTech, London, United Kingdom, pp 273–284
Rivoire S, Shah MA, Ranganathan P, Kozyrakis C, Meza J (2007) Models and metrics to enable energy-efficiency optimizations. Computer 40(12):39–48
University of California, Irvine (2018) Machine learning repository. https://archive.ics.uci.edu/ml/datasets.html
University of California, Irvine (2018) Synthetic control chart dataset. http://archive.ics.uci.edu/ml/machine-learning-databases/synthetic_control-mld/synthetic_control.data.html
Fränti P et al (2015) Clustering datasets. http://cs.uef.fi/sipu/datasets/
Wang J, Wang J, Song J, Xu XS, Shen HT, Li S (2015) Optimized cartesian k-means. IEEE Trans Knowl Data Eng 27(1):180–192
Wu F, Wu Q, Tan Y, Wei L, Shao L, Gao L (2013) A vectorized K-means algorithm for intel many integrated core architecture. In: International Symposium on Advanced Parallel Processing Technologies, pp 277–294
Xiao L, Shao Z, Liu G (2006) K-means algorithm based on particle swarm optimization algorithm for anomaly intrusion detection. In: The Sixth World Congress on Intelligent Control and Automation, vol 2, pp 5854–5858
Zechner M, Granitzer M (2009) K-means on the graphics processor: design and experimental analysis. Int J Adv Syst Meas 2:224–235
Zeng G (2012) Fast approximate K-means via cluster closures. In: Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition, CVPR’12, Washington, DC, USA, pp 3037–3044
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Al Hasib, A., Cebrian, J.M. & Natvig, L. A vectorized k-means algorithm for compressed datasets: design and experimental analysis. J Supercomput 74, 2705–2728 (2018). https://doi.org/10.1007/s11227-018-2310-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-018-2310-0