Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3332466.3374521acmconferencesArticle/Chapter ViewAbstractPublication PagesppoppConference Proceedingsconference-collections
research-article

spECK: accelerating GPU sparse matrix-matrix multiplication through lightweight analysis

Published: 19 February 2020 Publication History

Abstract

Sparse general matrix-matrix multiplication on GPUs is challenging due to the varying sparsity patterns of sparse matrices. Existing solutions achieve good performance for certain types of matrices, but fail to accelerate all kinds of matrices in the same manner. Our approach combines multiple strategies with dynamic parameter selection to dynamically choose and tune the best fitting algorithm for each row of the matrix. This choice is supported by a lightweight, multi-level matrix analysis, which carefully balances analysis cost and expected performance gains. Our evaluation on thousands of matrices with various characteristics shows that we outperform all currently available solutions in 79% over all matrices with >15k products and that we achieve the second best performance in 15%. For these matrices, our solution is on average 83% faster than the second best approach and up to 25X faster than other state-of-the-art GPU implementations. Using our approach, applications can expect great performance independent of the matrices they work on.

References

[1]
Pham Nguyen Quang Anh, Rui Fan, and Yonggang Wen. 2016. Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. (2016), 1--12.
[2]
Nathan Bell, Steven Dalton, and Luke N. Olson. 2012. Exposing Fine-Grained Parallelism in Algebraic Multigrid Methods. SIAM Journal on Scientific Computing 34, 4 (jan 2012), C123--C152.
[3]
Nathan Bell and Michael Garland. 2012. Cusp: Generic Parallel Algorithms for Sparse Matrix and Graph Computations. http://cusp-library.googlecode.com
[4]
Steven Dalton, Sean Baxter, Duane Merrill, Luke Olson, and Michael Garland. 2015. Optimizing Sparse Matrix Operations on GPUs Using Merge Path. In 2015 IEEE International Parallel and Distributed Processing Symposium. IEEE, 407--416.
[5]
Steven Dalton, Luke Olson, and Nathan Bell. 2015. Optimizing Sparse Matrix-Matrix Multiplication for the GPU. ACM Trans. Math. Softw 41 (2015).
[6]
Timothy A Davis. 2017. SuiteSparse: A Suite of Sparse matrix packages. http://www.cise.ufl.edu/~davis/.
[7]
M. Deveci, C. Trott, and S. Rajamanickam. 2017. Performance-portable sparse matrix-matrix multiplication for many-core architectures. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 693--702.
[8]
Donald Knuth. 1963. NOTES ON OPEN ADDRESSING. Technical Report. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.4899{&}rep=rep1{&}type=pdf
[9]
Joseph L. Greathouse and Mayank Daga. 2014. Efficient Sparse Matrix-Vector Multiplication on GPUs Using the CSR Storage Format. In SC14: International Conference for High Performance Computing, Networking, Storage and Analysis. IEEE, 769--780.
[10]
Felix Gremse, Andreas Höfter, Lars Ole Schwen, Fabian Kiessling, and Uwe Naumann. 2015. GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging. SIAM Journal on Scientific Computing 37, 1 (2015), C54--C71.
[11]
Felix Gremse, Kerstin Küpper, and Uwe Naumann. 2018. Memory-Efficient Sparse Matrix-Matrix Multiplication by Row Merging on Many-Core Architectures. SIAM Journal on Scientific Computing 40 (01 2018), C429--C449.
[12]
Jeremy Kepner, Peter Aaltonen, David Bader, Aydin Buluç, Franz Franchetti, John Gilbert, Dylan Hutchison, Manoj Kumar, Andrew Lumsdaine, Henning Meyerhenke, et al. 2016. Mathematical foundations of the GraphBLAS. In 2016 IEEE High Performance Extreme Computing Conference (HPEC). IEEE, 1--9.
[13]
Rakshith Kunchum, Ankur Chaudhry, Aravind Sukumaran-Rajam, Qingpeng Niu, Israt Nisa, and P. Sadayappan. 2017. On improving performance of sparse matrix-matrix multiplication on GPUs. 11 (2017), 1--11.
[14]
Weifeng Liu and Brian Vinter. 2014. An efficient GPU general sparse matrix-matrix multiplication for irregular data. Proceedings of the International Parallel and Distributed Processing Symposium, IPDPS (2014), 370--381.
[15]
Md Mostofa, Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms. High Performance Computing (2015), 48--57.
[16]
Yusuke Nagasaka, Akira Nukada, and Satoshi Matsuoka. 2017. High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. In 2017 46th International Conference on Parallel Processing (ICPP). IEEE, 101--110.
[17]
NVIDIA. 2019. The API reference guide for cuSPARSE, the CUDA sparse matrix library. (v9.1 ed.). NVIDIA.
[18]
Viral Shah and John R. Gilbert. 2010. Sparse Matrices in Matlab*P: Design and Implementation. Technical Report. 144--155 pages.
[19]
Martin Winter, Daniel Mlakar, Rhaleb Zayer, Hans-Peter Seidel, and Markus Steinberger. 2019. Adaptive sparse matrix-matrix multiplication on the GPU. In Proceedings of the 24th Symposium on Principles and Practice of Parallel Programming - PPoPP '19. ACM Press, New York, New York, USA, 68--81.
[20]
Rhaleb Zayer, Markus Steinberger, and Hans-Peter Seidel. 2017. A GPU-Adapted Structure for Unstructured Grids. Computer Graphics Forum 36, 2 (2017), 495--507. arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/cgf.l3144

Cited By

View all
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
  • (2024)A Coordinated Strategy for GNN Combining Computational Graph and Operator OptimizationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3661896(460-472)Online publication date: 30-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PPoPP '20: Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
February 2020
454 pages
ISBN:9781450368186
DOI:10.1145/3332466
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication Notes

Badge change: Article originally badged under Version 1.0 guidelines https://www.acm.org/publications/policies/artifact-review-badging

Publication History

Published: 19 February 2020

Permissions

Request permissions for this article.

Check for updates

Badges

Qualifiers

  • Research-article

Funding Sources

Conference

PPoPP '20

Acceptance Rates

PPoPP '20 Paper Acceptance Rate 28 of 121 submissions, 23%;
Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)212
  • Downloads (Last 6 weeks)27
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Predicting optimal sparse general matrix-matrix multiplication algorithm on GPUsInternational Journal of High Performance Computing Applications10.1177/1094342024123192838:3(245-259)Online publication date: 1-May-2024
  • (2024)SaSpGEMM: Sorting-Avoiding Sparse General Matrix-Matrix Multiplication on Multi-Core ProcessorsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673054(1166-1175)Online publication date: 12-Aug-2024
  • (2024)A Coordinated Strategy for GNN Combining Computational Graph and Operator OptimizationsProceedings of the 38th ACM International Conference on Supercomputing10.1145/3650200.3661896(460-472)Online publication date: 30-May-2024
  • (2024)FSpGEMM: A Framework for Accelerating Sparse General Matrix–Matrix Multiplication Using Gustavson’s Algorithm on FPGAsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.335549932:4(633-644)Online publication date: Apr-2024
  • (2024)PruneGNN: Algorithm-Architecture Pruning Framework for Graph Neural Network Acceleration2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00019(108-123)Online publication date: 2-Mar-2024
  • (2024)Seer: Predictive Runtime Kernel Selection for Irregular ProblemsProceedings of the 2024 IEEE/ACM International Symposium on Code Generation and Optimization10.1109/CGO57630.2024.10444812(133-142)Online publication date: 2-Mar-2024
  • (2024)Optimizing sparse general matrix–matrix multiplication for DCUsThe Journal of Supercomputing10.1007/s11227-024-06234-280:14(20176-20200)Online publication date: 30-May-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023
  • (2023)Efficient Execution of SpGEMM on Long Vector ArchitecturesProceedings of the 32nd International Symposium on High-Performance Parallel and Distributed Computing10.1145/3588195.3593000(101-113)Online publication date: 7-Aug-2023
  • (2023)Dissecting Tensor Cores via Microbenchmarks: Latency, Throughput and Numeric BehaviorsIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2022.321782434:1(246-261)Online publication date: 1-Jan-2023
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media