Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3210259.3210263acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Regularizing irregularity: bitmap-based and portable sparse matrix multiplication for graph data on GPUs

Published: 10 June 2018 Publication History

Abstract

Graphs can be naturally represented as sparse matrices. The relationship between graph algorithms and linear algebra algorithms is well understood and many graph problems can be abstracted as Sparse General Matrix-Matrix Multiplication (SpGEMM) operations. While quite some matrix storage formats, including bitmap-based ones, have been proposed for sparse matrices, they are mostly evaluated on the simpler Sparse Matrix-Vector Multiplication (SpMV) problems. In this study, we have developed data parallel algorithms to pair up bitmap-indexed sparse matrix blocks for SpGEMM using data parallel primitives for portability. Experiments on the WebBase-1M dataset with more than one million rows and columns and three million non-zero values have shown that our technique on squaring the large-scale sparse matrix using a 2013 GTX Titan GPU can complete in about 300 milliseconds. The runtime is 2.4X faster than CUSP and 3.5X faster than bhSPARSE, the two leading open source SpGEMM packages. Furthermore, our bitmap-indexed sparse matrix blocks can be efficiently converted to regular small dense matrices and subsequently utilize new hardware accelerations, such as tensor cores inside Nvidia Volta GPUs and Google Tensor Processing Units (TPUs), for more efficient implementations.

References

[1]
J. Kepner and J. Gilbert, Graph Algorithms in the Language of Linear Algebra, SIAM, 2011, 348 pages.
[2]
S. Che, B. M. Beckmann and S. K. Reinhardt, "Programming GPGPU Graph Applications with Linear Algebra Building Blocks," International Journal of Parallel Programming, vol. 45, no. 3, pp. 657--679, 2017.
[3]
M. Besta, F. Marending, E. Solomonik and T. Hoefler, "SlimSell: A Vectorizable Graph Representation for Breadth-First Search," in Proc. IEEE IPDPS'17, 2017.
[4]
Nvidia, "NVGRAPH Library User's Guide," Online, 2017.
[5]
S. Che, B. M. Beckmann and S. K. Reinhardt, "BelRed: Constructing GPGPU graph applications with software building blocks," in Proc. IEEE HPEC'14, 2014.
[6]
A. Ashari, N. Sedaghati, J. Eisenlohr, S. Parthasarath and P. Sadayappan, "Fast Sparse Matrix-Vector Multiplication on GPUs for Graph Applications," in Proc. SC'14, 2014.
[7]
X. Yang, S. Parthasarathy and P. Sadayappan, "Fast Sparse Matrix-vector Multiplication on GPUs: Implications for Graph Mining," Proc. VLDB Endow., vol. 4, no. 4, pp. 231--242, 2011.
[8]
R. Zayer, M. Steinberger and H.-P. Seidel, "A GPU-Adapted Structure for Unstructured Grids," Computer Graphics Forum, vol. 36, no. 2, pp. 1467--8659, 2017.
[9]
Y.-Y. Jo, S.-W. Kim and D.-H. Bae, "Efficient Sparse Matrix Multiplication on GPU for Large Social Network Analysis," in Proc. ACM CIKM'15, 2015.
[10]
D. Langr and P. Tvrdík, "Evaluation Criteria for Sparse Matrix Storage Formats," IEEE Transactions on Parallel and Distributed Systems, vol. 27, no. 2, pp. 428--440, 2016.
[11]
R. Kannan, "Efficient sparse matrix multiple-vector multiplication using a bitmapped format," in Proc. IEEE HiPC, 2013.
[12]
W. Liu and B. Vinter, "CSR5: An Efficient Storage Format for Cross-Platform Sparse Matrix-Vector Multiplication," in Proc. SC'15, 2015.
[13]
L. Liu, M. Liu, C. Wang and J. Wang, "LSRB-CSR: A Low Overhead Storage Format for SpMV on the GPU Systems," in Proc. IEEE ICPADS'15, 2015.
[14]
B. Qin and F. Rusu, "Dot-Product Join: An Array-Relation Join Operator for Big Model Analytics," arXiv:1602.08845, 2017
[15]
J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quantitative Approach (5th Ed.), Morgan Kaufmann, 2011.
[16]
Nvidia, "Inside Volta: The World's Most Advanced Data Center GPU," 2017.
[17]
N. P. Jouppi, C. Young, N. Patil and D. Patterson, "In-Datacenter Performance Analysis of a Tensor Processing Unit," in Proc. ISCA'17, 2017.
[18]
University of Florida, "Sparse Matrix Collection," {Online}. https://www.cise.ufl.edu/research/sparse/matrices/Williams/webbase-1M.html.
[19]
S. Dalton, N. Bell, L. Olson and M. Garland. {Online}. Available: http://cusplibrary.github.io/.
[20]
W. Liu and B. Vinter, "A Framework for General Sparse Matrix-matrix Multiplication on GPUs and Heterogeneous Processors," J. Parallel Distrib. Comput., pp. 47--61, 2015.
[21]
B. Park, H.-M. Park, M. Yoon and U. Kang, "PMV: Pre-partitioned Generalized Matrix-Vector Multiplication," arXiv:1709.09099v1, 2017.
[22]
GraphBLAS. {Online}. Available: http://graphblas.org/.
[23]
ExaGraph. {Online}. Available: http://www.pnnl.gov/science/highlights/highlight.asp?id=4558.
[24]
H. Mao, S. Han, J. Pool, W. Li, X. Liu, Y. Wang and W. J. Dally, "Exploring the Granularity of Sparsity in Convolutional Neural Networks," in IEEE CVPRW'17, 2017.
[25]
S. Filippone, V. Cardellini, D. Barbieri and A. Fanfarillo, "Sparse Matrix-Vector Multiplication on GPGPUs," ACM Trans. Math. Softw., vol. 43, no. 4, pp. 30:1--30:49, 2017.
[26]
Z. Koza, M. Matyka, S. Szkoda and Ł. Mirosław, "Compressed Multirow Storage Format for Sparse Matrices on Graphics Processing Units," SIAM J. Sci. Comput., vol. 36, no. 2, pp. 219--239.
[27]
E.-J. Im, K. Yelick and R. Vuduc, "Sparsity: Optimization Framework for Sparse Matrix Kernels," Int. J. High Perform. Comput. Appl., vol. 18, no. 1, pp. 135--158, 2014.
[28]
J. Zhang, J. Wan, F. Li, J. Mao, L. Zhuang, J. Yuan, E. Liu and Z. Yu, "Efficient sparse matrix-vector multiplication using cache oblivious extension quadtree storage format," Future Generation Computer Systems, vol. 54, pp. 490--500, 206.
[29]
A. Ashari, N. Sedaghati, J. Eisenlohr and P. Sadayappan, "An Efficient Two-dimensional Blocking Strategy for Sparse Matrix-vector Multiplication on GPUs," in Proc. ACM ICS'14, 2014.
[30]
S. Dalton, L. Olson and N. Bell, "Optimizing Sparse Matrix-Matrix Multiplication for the GPU," ACM Trans. Math. Softw., vol. 41, no. 4, pp. 25:1--25:20, 2015.
[31]
W. Liu and B. Vinte, "bhSPARSE," {Online}. Available: https://github.com/bhSPARSE/bhSPARSE.
[32]
P. N. Q. Anh, R. Fan and Y. Wen, "Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication," in Proc. ACM ICS'16, 2016.
[33]
R. Kunchum, A. Chaudhry, A. Sukumaran-Rajam, Q. Niu, I. Nisa and P. Sadayappan, "On Improving Performance of Sparse Matrix-matrix Multiplication on GPUs," in Proc. ICS'17, 2017.
[34]
F. G. Gustavson, "Two Fast Algorithms for Sparse Matrices: Multiplication and Permuted Transposition," ACM Trans. Math. Softw., vol. 4, pp. 250--269, 1978.
[35]
M. Deveci, C. Trott and S. Rajamanickam, "Performance-portable sparse matrix-matrix multiplication for many-core architectures," in 2017 Proc. IEEE IPDPSW, 2017.
[36]
H. Edwards, C. R. Trott and D. Sunderland, "Kokkos: Enabling manycore performance portability through polymorphic memory access patterns," Journal of Parallel and Distributed Computing, vol. 74, no. 12, pp. 3202--3216, 2014.
[37]
M. McCool, A. Robison and J. Reinders, Structured Parallel Programming: Patterns for Efficient Computation, Morgan Kaufmann, 2012.
[38]
D. B. Kirk and W.-m. W. Hwu, Programming Massively Parallel Processors: A Hands-on Approach, 2nd ed., Morgan Kaufmann, 2012.
[39]
Nvidia, "Thrust parallel algorithms library," {Online}. Available: https://thrust.github.io/.
[40]
S. Dalton, S. Baxter, D. Merrill, L. Olson and M. Garland, "Optimizing Sparse Matrix Operations on GPUs Using Merge Path," in Proc. IEEE IPDPS'15, 2015.
[41]
F. Gremse, A. Höfter, L. O. Schwen, F. Kiessling and U. Naumann, "GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging," SIAM Journal on Scientific Computing, vol. 37, no. 1, pp. 54--71, 2015.
[42]
Y. Nagasaka, A. Nukada and S. Matsuoka, "High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU," in Proc. ICPP'17, 2017.
[43]
M. M. A. Patwary, N. R. Satish, N. Sundaram, J. Park, M. J. Anderson, S. G. Vadlamudi, D. Das, S. G. P. O. Pirogov and P. Dubey, "Parallel Efficient Sparse Matrix-Matrix Multiplication on Multicore Platforms," in Proc. SC'15, 2015.
[44]
A. Buluç and J. R. Gilbert, "Parallel Sparse Matrix-Matrix Multiplication and Indexing: Implementation and Experiments," SIAM J. Sci. Comput., vol. 34, no. 4, pp. 170--191, 2012.
[45]
M. Kreutzer, G. Hager, G. Wellein, H. Fehske and A. R. Bishop, "A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units," SIAM J. Sci. Comput., vol. 36, no. 5, pp. 401--423, 2014.
[46]
N. Sedaghati, T. Mu, L.-N. Pouchet, S. Parthasarathy and P. Sadayappan, "Automatic Selection of Sparse Matrix Representation on GPUs," in Proc. ICS'15, 2015.
[47]
A. Derler, R. Zayer, H.-P. Seidel and M. Steinberger, "Dynamic Scheduling for Efficient Hierarchical Sparse Matrix Operations on the GPU," in Proc. ACM ICS'17, 2017.
[48]
J. King, T. Gilray, R. M. Kirby and M. Might, "Dynamic sparse-matrix allocation on GPUs," in Proc. ACM ICS'16, 2016.
[49]
M. Steinberger, R. Zayer and H.-P. Seidel, "Globally Homogeneous, Locally Adaptive Sparse Matrix-vector Multiplication on the GPU," in Proc. ACM ICS '17, 2017.
[50]
Brisaboa N.R., Ladra S., Navarro G. (2009) k2-Trees for Compact Web Graph Representation. In: Karlgren J., Tarhio J., Hyyrö H. (eds) String Processing and Information Retrieval (SPIRE'09), Springer LNCS 5721.
[51]
S. Williams, L. Oliker, R. Vuduc, J. Shalf, K. Yelick and J. Demmel, "Optimization of Sparse Matrix-vector Multiplication on Emerging Multicore Platforms," in Proc. SC'07, 2007

Cited By

View all
  • (2024)BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347774635:12(2435-2448)Online publication date: Dec-2024
  • (2024)Improve the Utility of Tensor Cores by Compacting Sparse Matrix Technique2024 14th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE65377.2024.10874581(14-19)Online publication date: 19-Nov-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
GRADES-NDA '18: Proceedings of the 1st ACM SIGMOD Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA)
June 2018
94 pages
ISBN:9781450356954
DOI:10.1145/3210259
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 10 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. GPU
  2. bitmap-indexing
  3. data parallel design
  4. graph operations
  5. sparse matrix multiplication

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

GRADES-NDA '18 Paper Acceptance Rate 10 of 26 submissions, 38%;
Overall Acceptance Rate 29 of 61 submissions, 48%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)250
  • Downloads (Last 6 weeks)36
Reflects downloads up to 16 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)BCB-SpTC: An Efficient Sparse High-Dimensional Tensor Contraction Employing Tensor Core AccelerationIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2024.347774635:12(2435-2448)Online publication date: Dec-2024
  • (2024)Improve the Utility of Tensor Cores by Compacting Sparse Matrix Technique2024 14th International Conference on Computer and Knowledge Engineering (ICCKE)10.1109/ICCKE65377.2024.10874581(14-19)Online publication date: 19-Nov-2024
  • (2024)Enhancing the Sparse Matrix Storage Using Reordering TechniquesHigh Performance Computing10.1007/978-3-031-52186-7_5(66-76)Online publication date: 28-Jan-2024
  • (2023)Trajectory-based Metaheuristics for Improving Sparse Matrix Storage2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI58595.2023.10409303(1-6)Online publication date: 29-Oct-2023
  • (2023)SGCN: Exploiting Compressed-Sparse Features in Deep Graph Convolutional Network Accelerators2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071102(1-14)Online publication date: Feb-2023
  • (2023)Sparse Matrix-Vector Product for the bmSparse Matrix Format in GPUsEuro-Par 2023: Parallel Processing Workshops10.1007/978-3-031-50684-0_19(246-256)Online publication date: 28-Aug-2023
  • (2022)A Pattern-Based SpGEMM Library for Multi-Core and Many-Core ArchitecturesIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309032833:1(159-175)Online publication date: 1-Jan-2022
  • (2022)Towards an Efficient Sparse Storage Format for the SpMM Kernel in GPUsEuro-Par 2021: Parallel Processing Workshops10.1007/978-3-031-06156-1_9(104-115)Online publication date: 9-Jun-2022
  • (2022)Advancing on an efficient sparse matrix multiplication kernel for modern GPUsConcurrency and Computation: Practice and Experience10.1002/cpe.727135:20Online publication date: 19-Aug-2022
  • (2021)Unleashing the performance of bmSparse for the sparse matrix multiplication in GPUs2021 12th Workshop on Latest Advances in Scalable Algorithms for Large-Scale Systems (ScalA)10.1109/ScalA54577.2021.00008(19-26)Online publication date: Nov-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media