Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2818950.2818977acmotherconferencesArticle/Chapter ViewAbstractPublication PagesmemsysConference Proceedingsconference-collections
research-article
Public Access

k-Means Clustering on Two-Level Memory Systems

Published: 05 October 2015 Publication History

Abstract

In recent work we quantified the anticipated performance boost when a sorting algorithm is modified to leverage user-addressable "near-memory," which we call scratchpad. This architectural feature is expected in the Intel Knight's Landing processors that will be used in DOE's next large-scale supercomputer.
This paper expands our analytical study of the scratchpad to consider k-means clustering, a classical data-analysis technique that is ubiquitous in the literature and in practice. We present new theoretical results using the model introduced in [13], which measures memory transfers and assumes that computations are memory-bound. Our theoretical results indicate that scratchpad-aware versions of k-means clustering can expect performance boosts for high-dimensional instances with relatively few cluster centers. These constraints may limit the practical impact of scratch-pad for k-means acceleration, so we discuss their origins and practical implications. We corroborate our theory with experimental runs on a system instrumented to mimic one with scratchpad memory.
We also contribute a semi-formalization of the computational properties that are necessary and sufficient to predict a performance boost from scratchpad-aware variants of algorithms. We have observed and studied these properties in the context of sorting, and now clustering.
We conclude with some thoughts on the application of these properties to new areas. Specifically, we believe that dense linear algebra has similar properties to k-means, while sparse linear algebra and FFT computations are more similar to sorting. The sparse operations are more common in scientific computing, so we expect scratchpad to have significant impact in that area.

References

[1]
A. Aggarwal and J. S. Vitter. The input/output complexity of sorting and related problems. Communications of the ACM, 31(9):1116--1127, Sept. 1988.
[2]
N. Ailon, R. Jaiswal, and C. Monteleoni. Streaming k-means approximation. In 23rd Annual Conference on Neural Information Processing Systems (NIPS), pages 10--18, 2009.
[3]
D. Ajwani, N. Sitchinava, and N. Zeh. Geometric algorithms for private-cache chip multiprocessors. In Proceedings of the Eighteenth Annual European Symposium on Algorithms (ESA), pages 75--86. 2010.
[4]
D. Aloise, A. Deshpande, P. Hansen, and P. Popat. NP-hardness of Euclidean sum-of-squares clustering. Machine Learning, 75(2):245--248, 2009.
[5]
L. Arge, M. T. Goodrich, M. Nelson, and N. Sitchinava. Fundamental parallel algorithms for private-cache chip multiprocessors. In Proceedings of the Twentieth Annual Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 197--206, 2008.
[6]
L. Arge, M. T. Goodrich, and N. Sitchinava. Parallel external memory graph algorithms. In Proceedings of the IEEE International Symposium on Parallel & Distributed Processing (IPDPS), pages 1--11, 2010.
[7]
D. Arthur and S. Vassilvitskii. k-means++: the advantages of careful seeding. In Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1027--1035, 2007.
[8]
V. Arya, N. Garg, R. Khandekar, A. Meyerson, K. Munagala, and V. Pandit. Local search heuristics for k-median and facility location problems. SIAM J. Comput., 33(3):544--562, 2004.
[9]
P. Awasthi, M. Charikar, R. Krishnaswamy, and A. K. Sinop. The hardness of approximation of euclidean k-means. CoRR, abs/1502.03316, 2015.
[10]
B. Bahmani, B. Moseley, A. Vattani, R. Kumar, and S. Vassilvitskii. Scalable k-means++. PVLDB, 5(7):622--633, 2012.
[11]
R. Banakar, S. Steinke, B.-S. Lee, M. Balakrishnan, and P. Marwedel. Scratchpad Memory: A Design Alternative for Cache On-chip Memory in Embedded Systems. In Proceedings of the Tenth International Symposium on Hardware/Software Codesign (CODES), pages 73--78, 2002.
[12]
R. E. Bellman. Dynamic Programming. Princeton University Press, 1957.
[13]
M. A. Bender, J. Berry, S. D. Hammond, K. S. Hemmert, S. McCauley, B. Moore, B. Moseley, C. A. Phillips, D. Resnick, and A. Rodrigues. Two-level main memory co-design: Multi-threaded algorithmic primitives, analysis, and simulation. In Proc. 29th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Hyderabad, INDIA, May 2015.
[14]
M. A. Bender, R. Ebrahimi, J. T. Fineman, G. Ghasemiesfeh, R. Johnson, and S. McCauley. Cache-adaptive algorithms. In Proceedings of the Twenty-Fifth Symposium on Discrete Algorithms (SODA), pages 116--128, 2014.
[15]
G. S. Brodal, E. D. Demaine, J. T. Fineman, J. Iacono, S. Langerman, and J. I. Munro. Cache-oblivious dynamic dictionaries with update/query tradeoffs. In Proceedings of the Twenty-First Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 1448--1456, 2010.
[16]
G. S. Brodal and R. Fagerberg. On the limits of cache-obliviousness. In Proceedings of the Thirty-Fifth Annual ACM Symposium on Theory of Computing (STOC), pages 307--315, 2003.
[17]
M. Charikar, S. Guha, É. Tardos, and D. B. Shmoys. A constant-factor approximation algorithm for the k-median problem. J. Comput. Syst. Sci., 65(1):129--149, 2002.
[18]
M. Danilevsky and E. Koh. Information graph model and application to online advertising. In Proceedings of the 1st Workshop on User Engagement Optimization, UEO '13, pages 11--14, 2013.
[19]
A. Ene, S. Im, and B. Moseley. Fast clustering using mapreduce. In Proc. 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 681--689, 2011.
[20]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. In Proceedings of the 40th Annual ACM Symposium on Foundations of Computer Science (FOCS), pages 285--297, 1999.
[21]
M. Frigo, C. E. Leiserson, H. Prokop, and S. Ramachandran. Cache-oblivious algorithms. ACM Transactions on Algorithms, 8(1):4, 2012.
[22]
S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams: Theory and practice. IEEE Trans. Knowl. Data Eng., 15(3):515--528, 2003.
[23]
S. Guha, N. Mishra, R. Motwani, and L. O'Callaghan. Clustering data streams. In Proc. 41st Annual Symposium on Foundations of Computer Science (FOCS), pages 359--366, 2000.
[24]
http://www.hpcwire.com/2014/06/24/micron-intel-reveal-memory-slice-knights-landing/.
[25]
W. Liao. Parallel k-means data clustering, 2011. code.
[26]
M. Lichman. UCI machine learning repository, 2013.
[27]
S. Lloyd. Least squares quantization in PCM. IEEE Trans. Inf. Theor., 28(2):129--137, Sept. 2006.
[28]
H. M. Moftah, W. H. Elmasry, N. El-Bendary, A. E. Hassanien, and K. Nakamatsu. Evaluating the effects of k-means clustering approach on medical images. In 12th International Conference on Intelligent Systems Design and Applications, ISDA 2012, Kochi, India, November 27--29, 2012, pages 455--459, 2012.
[29]
http://nnsa.energy.gov/mediaroom/pressreleases/trinity.
[30]
A. F. Rodrigues, K. S. Hemmert, B. W. Barrett, C. Kersey, R. Oldfield, M. Weston, R. Risen, J. Cook, P. Rosenfeld, E. CooperBalls, and B. Jacob. The Structural Simulation Toolkit. SIGMETRICS Perform. Eval. Rev., 38(4):37--42, Mar. 2011.
[31]
N. Sitchinava and N. Zeh. A parallel buffer tree. In Proceedings of the Twenty-Fourth ACM Symposium on Parallelism in Algorithms and Architectures (SPAA), pages 214--223, 2012.
[32]
S. Steinke, L. Wehmeyer, B. Lee, and P. Marwedel. Assigning program and data objects to scratchpad for energy reduction. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE), pages 409--415, 2002.
[33]
M. Thorup. Quick k-median, k-center, and facility location for sparse graphs. SIAM J. Comput., 34(2):405--432, 2004.
[34]
http://insidehpc.com/2014/07/cray-wins-174-million-contract-trinity-supercomputer-based-knights-landing.
[35]
G. C. Tseng. Penalized and weighted k-means for clustering with scattered objects and prior information in high-throughput biological data. Bioinformatics, 23(17):2247--2255, Aug. 2007.
[36]
X.Wu, V. Kumar, J. R. Quinlan, J. Ghosh, Q. Yang, H. Motoda, G. J. McLachlan, A. F. M. Ng, B. Liu, P. S. Yu, Z. Zhou, M. Steinbach, D. J. Hand, and D. Steinberg. Top 10 algorithms in data mining. Knowl. Inf. Syst., 14(1):1--37, 2008.
[37]
A. Zimek, E. Schubert, and H.-P. Kriegel. A survey on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining: The ASA Data Science Journal, 5(5):363--387, 2012.

Cited By

View all
  • (2023)Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00013(35-49)Online publication date: Apr-2023
  • (2020)On the efficiency of K-means clusteringProceedings of the VLDB Endowment10.14778/3425879.342588714:2(163-175)Online publication date: 1-Oct-2020
  • (2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
MEMSYS '15: Proceedings of the 2015 International Symposium on Memory Systems
October 2015
278 pages
ISBN:9781450336048
DOI:10.1145/2818950
© 2015 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 October 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Scratchpad
  2. Two-Level-Memory
  3. Variable Bandwidth
  4. k-means

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

MEMSYS '15
MEMSYS '15: International Symposium on Memory Systems
October 5 - 8, 2015
DC, Washington DC, USA

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)64
  • Downloads (Last 6 weeks)17
Reflects downloads up to 14 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2023)Evaluating Machine LearningWorkloads on Memory-Centric Computing Systems2023 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS)10.1109/ISPASS57527.2023.00013(35-49)Online publication date: Apr-2023
  • (2020)On the efficiency of K-means clusteringProceedings of the VLDB Endowment10.14778/3425879.342588714:2(163-175)Online publication date: 1-Oct-2020
  • (2020)How to Manage High-Bandwidth Memory AutomaticallyProceedings of the 32nd ACM Symposium on Parallelism in Algorithms and Architectures10.1145/3350755.3400233(187-199)Online publication date: 6-Jul-2020
  • (2019)A Hybrid MPI/OpenMP Parallelization of $K$ -Means Algorithms Accelerated Using the Triangle InequalityIEEE Access10.1109/ACCESS.2019.29078857(42280-42297)Online publication date: 2019
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.5555/3291656.3291674(1-11)Online publication date: 11-Nov-2018
  • (2018)StakeProceedings of the International Symposium on Memory Systems10.1145/3240302.3240307(365-376)Online publication date: 1-Oct-2018
  • (2018)Large-scale hierarchical k-means for heterogeneous many-core supercomputersProceedings of the International Conference for High Performance Computing, Networking, Storage, and Analysis10.1109/SC.2018.00016(1-11)Online publication date: 11-Nov-2018
  • (2017)Identifying the potential of near data processing for apache sparkProceedings of the International Symposium on Memory Systems10.1145/3132402.3132427(60-67)Online publication date: 2-Oct-2017
  • (2017)Optimal data layout for block-level random accesses to scratchpad2017 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC.2017.8091088(1-7)Online publication date: Sep-2017
  • (2016)Challenges and Opportunities for Dataflow Processing on Exascale ComputersProceedings of the Sixth Workshop on Data-Flow Execution Models for Extreme Scale Computing10.1145/3292533.3292537(1-5)Online publication date: 15-Sep-2016

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media