research-article

Public Access

k-Means Clustering on Two-Level Memory Systems

Authors:

Michael A. Bender,

Jonathan Berry,

Simon D. Hammond,

Benjamin Moseley,

Cynthia A. PhillipsAuthors Info & Claims

MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

Pages 197 - 205

https://doi.org/10.1145/2818950.2818977

Published: 05 October 2015 Publication History

Abstract

In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user-addressable "near-memory," which we call scratchpad. This architectural feature is expected in the Intel Knight's Landing processors that will be used in DOE's next large-scale supercomputer.

This paper expands our analytical study of the scratchpad to consider k-means clustering, a classical data-analysis technique that is ubiquitous in the literature and in practice. We present new theoretical results using the model introduced in [13], which measures memory transfers and assumes that computations are memory-bound. Our theoretical results indicate that scratchpad-aware versions of k-means clustering can expect performance boosts for high-dimensional instances with relatively few cluster centers. These constraints may limit the practical impact of scratch-pad for k-means acceleration, so we discuss their origins and practical implications. We corroborate our theory with experimental runs on a system instrumented to mimic one with scratchpad memory.

We also contribute a semi-formalization of the computational properties that are necessary and sufficient to predict a performance boost from scratchpad-aware variants of algorithms. We have observed and studied these properties in the context of sorting, and now clustering.

We conclude with some thoughts on the application of these properties to new areas. Specifically, we believe that dense linear algebra has similar properties to k-means, while sparse linear algebra and FFT computations are more similar to sorting. The sparse operations are more common in scientific computing, so we expect scratchpad to have significant impact in that area.

References

[1]

A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, Sept. 1988.

Digital Library

[2]

N. Ailon, R. Jaiswal, and C. Monteleoni. Streaming k-means approximation. In 23rd Annual Conference on Neural Information Processing Systems (NIPS), pages 10--18, 2009.

[3]

D. Ajwani, N. Sitchinava, and N. Zeh. Geometric algorithms for private-cache chip multiprocessors. In Proceedings of the Eighteenth Annual European Symposium on Algorithms (ESA), pages 75--86. 2010.

Digital Library

[4]

D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245--248, 2009.

Digital Library

[5]

L. Arge, M. T. Goodrich, M. Nelson, and N. Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 197--206, 2008.

Digital Library

[6]

L. Arge, M. T. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--11, 2010.

[7]

D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027--1035, 2007.

Digital Library

[8]

V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544--562, 2004.

Digital Library

[9]

P. Awasthi, M. Charikar, R. Krishnaswamy, and A. K. Sinop. The hardness of approximation of euclidean k-means. CoRR, abs/1502.03316, 2015.

[10]

B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable k-means++. PVLDB, 5(7):622--633, 2012.

Digital Library

[11]

R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad Memory: A Design Alternative for Cache On-chip Memory in Embedded Systems. In Proceedings of the Tenth International Symposium on Hardware/Software Codesign (CODES), pages 73--78, 2002.

Digital Library

[12]

R. E. Bellman. Dynamic Programming. Princeton University Press, 1957.

Digital Library

[13]

M. A. Bender, J. Berry, S. D. Hammond, K. S. Hemmert, S. McCauley, B. Moore, B. Moseley, C. A. Phillips, D. Resnick, and A. Rodrigues. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. In Proc. 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, INDIA, May 2015.

Digital Library

[14]

M. A. Bender, R. Ebrahimi, J. T. Fineman, G. Ghasemiesfeh, R. Johnson, and S. McCauley. Cache-adaptive algorithms. In Proceedings of the Twenty-Fifth Symposium on Discrete Algorithms (SODA), pages 116--128, 2014.

Digital Library

[15]

G. S. Brodal, E. D. Demaine, J. T. Fineman, J. Iacono, S. Langerman, and J. I. Munro. Cache-oblivious dynamic dictionaries with update/query tradeoffs. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1448--1456, 2010.

Digital Library

[16]

G. S. Brodal and R. Fagerberg. On the limits of cache-obliviousness. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing (STOC), pages 307--315, 2003.

Digital Library

[17]

M. Charikar, S. Guha, É. Tardos, and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129--149, 2002.

Digital Library

[18]

M. Danilevsky and E. Koh. Information graph model and application to online advertising. In Proceedings of the 1st Workshop on User Engagement Optimization, UEO '13, pages 11--14, 2013.

Digital Library

[19]

A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 681--689, 2011.

Digital Library

[20]

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual ACM Symposium on Foundations of Computer Science (FOCS), pages 285--297, 1999.

Digital Library

[21]

M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012.

Digital Library

[22]

S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515--528, 2003.

Digital Library

[23]

S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 359--366, 2000.

Digital Library

[24]

http://www.hpcwire.com/2014/06/24/micron-intel-reveal-memory-slice-knights-landing/.

[25]

W. Liao. Parallel k-means data clustering, 2011. code.

[26]

M. Lichman. UCI machine learning repository, 2013.

[27]

S. Lloyd. Least squares quantization in PCM. IEEE Trans. Inf. Theor., 28(2):129--137, Sept. 2006.

Digital Library

[28]

H. M. Moftah, W. H. Elmasry, N. El-Bendary, A. E. Hassanien, and K. Nakamatsu. Evaluating the effects of k-means clustering approach on medical images. In 12th International Conference on Intelligent Systems Design and Applications, ISDA 2012, Kochi, India, November 27--29, 2012, pages 455--459, 2012.

[29]

http://nnsa.energy.gov/mediaroom/pressreleases/trinity.

[30]

A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, Mar. 2011.

Digital Library

[31]

N. Sitchinava and N. Zeh. A parallel buffer tree. In Proceedings of the Twenty-Fourth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 214--223, 2012.

Digital Library

[32]

S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pages 409--415, 2002.

Digital Library

[33]

M. Thorup. Quick k-median, k-center, and facility location for sparse graphs. SIAM J. Comput., 34(2):405--432, 2004.

Digital Library

[34]

http://insidehpc.com/2014/07/cray-wins-174-million-contract-trinity-supercomputer-based-knights-landing.

[35]

G. C. Tseng. Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics, 23(17):2247--2255, Aug. 2007.

Digital Library

[36]

X.Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. F. M. Ng, B. Liu, P. S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14(1):1--37, 2008.

Digital Library

[37]

A. Zimek, E. Schubert, and H.-P. Kriegel. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5):363--387, 2012.

Digital Library

Cited By

Gómez-Luna JGuo YBrocard SLegriel JCimadomo ROliveira GSingh GMutlu O(2023)Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00013(35-49)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00013
Wang SSun YBao Z(2020)On the efficiency of K-means clusteringProceedings of the VLDB Endowment10.14778/3425879.342588714:2(163-175)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.14778/3425879.3425887
Das RAgrawal KBender MBerry JMoseley BPhillips CScheideler CSpear M(2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400233
Show More Cited By

Index Terms

k-Means Clustering on Two-Level Memory Systems

Recommendations

Ensemble-Initialized k-Means Clustering
ICMLC '19: Proceedings of the 2019 11th International Conference on Machine Learning and Computing

As one of the most classical clustering techniques, the k-means clustering has been widely used in various areas over the past few decades. Despite its significant success, there are still several challenging issues in the k-means clustering research, ...
Initializing K-means Clustering Using Affinity Propagation
HIS '09: Proceedings of the 2009 Ninth International Conference on Hybrid Intelligent Systems - Volume 01

K-means clustering is widely used due to its fast convergence, but it is sensitive to the initial condition.Therefore, many methods of initializing K-means clustering have been proposed in the literatures. Compared with Kmeans clustering, a novel ...
Ant clustering algorithm with K-harmonic means clustering

Clustering is an unsupervised learning procedure and there is no a prior knowledge of data distribution. It organizes a set of objects/data into similar groups called clusters, and the objects within one cluster are highly similar and dissimilar with ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems

October 2015

278 pages

ISBN:9781450336048

DOI:10.1145/2818950

Copyright © 2015 ACM.

© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2015

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Conference

MEMSYS '15

MEMSYS '15: International Symposium on Memory Systems

October 5 - 8, 2015

DC, Washington DC, USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
465
Total Downloads

Downloads (Last 12 months)64
Downloads (Last 6 weeks)17

Reflects downloads up to 14 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Gómez-Luna JGuo YBrocard SLegriel JCimadomo ROliveira GSingh GMutlu O(2023)Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00013(35-49)Online publication date: Apr-2023
https://doi.org/10.1109/ISPASS57527.2023.00013
Wang SSun YBao Z(2020)On the efficiency of K-means clusteringProceedings of the VLDB Endowment10.14778/3425879.342588714:2(163-175)Online publication date: 1-Oct-2020
https://dl.acm.org/doi/10.14778/3425879.3425887
Das RAgrawal KBender MBerry JMoseley BPhillips CScheideler CSpear M(2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
https://dl.acm.org/doi/10.1145/3350755.3400233
Kwedlo WCzochanski P(2019)A Hybrid MPI/OpenMP Parallelization of $K$ -Means Algorithms Accelerated Using the Triangle InequalityIEEE Access10.1109/ACCESS.2019.29078857(42280-42297)Online publication date: 2019
https://doi.org/10.1109/ACCESS.2019.2907885
Li LYu TZhao WFu HWang CTan LYang GThomson J(2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291674(1-11)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.5555/3291656.3291674
Leidel JJacob B(2018)StakeProceedings of the International Symposium on Memory Systems10.1145/3240302.3240307(365-376)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1145/3240302.3240307
Li LYu TZhao WFu HWang CTan LYang GThomson J(2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00016(1-11)Online publication date: 11-Nov-2018
https://dl.acm.org/doi/10.1109/SC.2018.00016
Awan AOhara MAyguade EIshizaki KBrorsson MVlassov VJacob B(2017)Identifying the potential of near data processing for apache sparkProceedings of the International Symposium on Memory Systems10.1145/3132402.3132427(60-67)Online publication date: 2-Oct-2017
https://dl.acm.org/doi/10.1145/3132402.3132427
Singapura SKannan RPrasanna V(2017)Optimal data layout for block-level random accesses to scratchpad2017 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2017.8091088(1-7)Online publication date: Sep-2017
https://doi.org/10.1109/HPEC.2017.8091088
Wozniak JWilde MFoster I(2016)Challenges and Opportunities for Dataflow Processing on Exascale ComputersProceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing10.1145/3292533.3292537(1-5)Online publication date: 15-Sep-2016
https://dl.acm.org/doi/10.1145/3292533.3292537

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents