Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3229710.3229720acmotherconferencesArticle/Chapter ViewAbstractPublication PagesicppConference Proceedingsconference-collections
research-article

High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

Published: 13 August 2018 Publication History

Abstract

Sparse matrix-matrix multiplication (SpGEMM) is a computational primitive that is widely used in areas ranging from traditional numerical applications to recent big data analysis and machine learning. Although many SpGEMM algorithms have been proposed, hardware specific optimizations for multi- and many-core processors are lacking and a detailed analysis of their performance under various use cases and matrices is not available. We firstly identify and mitigate multiple bottlenecks with memory management and thread scheduling on Intel Xeon Phi (Knights Landing or KNL). Specifically targeting multi- and many-core processors, we develop a hash-table-based algorithm and optimize a heap-based shared-memory SpGEMM algorithm. We examine their performance together with other publicly available codes. Different from the literature, our evaluation also includes use cases that are representative of real graph algorithms, such as multi-source breadth-first search or triangle counting. Our hash-table and heap-based algorithms are showing significant speedups from libraries in the majority of the cases while different algorithms dominate the other scenarios with different matrix size, sparsity, compression factor and operation type. We wrap up in-depth evaluation results and make a recipe to give the best SpGEMM algorithm for target scenario. A critical finding is that hash-table-based SpGEMM gets a significant performance boost if the nonzeros are not required to be sorted within each row of the output matrix.

References

[1]
Sandeep R Agrawal, Christopher M Dee, and Alvin R Lebeck. 2016. Exploiting accelerators for efficient high dimensional similarity search. In PPoPP. ACM.
[2]
Pham Nguyen Quang Anh, Rui Fan, and Yonggang Wen. 2016. Balanced Hashing and Efficient GPU Sparse General Matrix-Matrix Multiplication. In ICS. ACM, New York, NY, USA, Article 36.
[3]
Ariful Azad, Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, Oded Schwartz, Sivan Toledo, and Samuel Williams. 2016. Exploiting multiple levels of parallelism in sparse matrix-matrix multiplication. SIAM Journal on Scientific Computing 38, 6 (2016), C624--C651.
[4]
Ariful Azad, Aydın Buluç, and John Gilbert. 2015. Parallel triangle counting and enumeration using matrix algebra. In IPDPSW.
[5]
Ariful Azad, Georgios A Pavlopoulos, Christos A Ouzounis, Nikos C Kyrpides, and Aydin Buluç. 2018. HipMCL: a high-performance parallel implementation of the Markov clustering algorithm for large-scale networks. Nucleic acids research (2018).
[6]
Grey Ballard, Christopher Siefert, and Jonathan Hu. 2016. Reducing communication costs for sparse matrix multiplication within algebraic multigrid. SIAM Journal on Scientific Computing 38, 3 (2016), C203--C231.
[7]
Nicolas Bock, Matt Challacombe, and Laxmikant V Kalé. 2016. Solvers for O(N) Electronic Structure in the Strong Scaling Limit. SIAM Journal on Scientific Computing 38, 1 (2016), C1--C21.
[8]
Aydın Buluç and John R. Gilbert. 2011. The Combinatorial BLAS: Design, Implementation, and Applications. IJHPCA 25, 4 (2011), 496--509.
[9]
Deepayan Chakrabarti, Yiping Zhan, and Christos Faloutsos. 2004. R-MAT: A recursive model for graph mining. In Proceedings of the 2004 SIAM International Conference on Data Mining. SIAM, 442--446.
[10]
Steven Dalton, Luke Olson, and Nathan Bell. 2015. Optimizing sparse matrix--matrix multiplication for the gpu. ACM Transactions on Mathematical Software (TOMS) 41, 4 (2015), 25.
[11]
Timothy A Davis. {n. d.}. private communication.
[12]
Timothy A Davis. 2006. Direct methods for sparse linear systems. SIAM.
[13]
Timothy A Davis and Yifan Hu. 2011. The University of Florida sparse matrix collection. ACM Transactions on Mathematical Software (TOMS) 38, 1 (2011), 1.
[14]
Mehmet Deveci, Christian Trott, and Sivasankaran Rajamanickam. 2017. Performance-Portable Sparse Matrix-Matrix Multiplication for Many-Core Architectures. In IPDPSW. IEEE, 693--702.
[15]
Elizabeth D Dolan and Jorge J More. 2002. Benchmarking optimization software with performance profiles. Mathematical programming 91, 2 (2002), 201--213.
[16]
John R Gilbert, Cleve Moler, and Robert Schreiber. 1992. Sparse matrices in MATLAB: Design and implementation. SIAM J. Matrix Anal. Appl. 13, 1 (1992), 333--356.
[17]
John R. Gilbert, Steve Reinhardt, and Viral B. Shah. 2007. High performance graph algorithms from parallel sparse matrices. In PARA. 260--269.
[18]
Felix Gremse, Andreas Hofter, Lars Ole Schwen, Fabian Kiessling, and Uwe Naumann. 2015. GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging. SIAM Journal on Scientific Computing 37, 1 (2015), C54--C71.
[19]
Fred G Gustavson. 1978. Two fast algorithms for sparse matrices: Multiplication and permuted transposition. ACM TOMS 4, 3 (1978), 250--269.
[20]
Guoming He, Haijun Feng, Cuiping Li, and Hong Chen. 2010. Parallel SimRank computation on large graphs with iterative aggregation. In SIGKDD. ACM.
[21]
Weifeng Liu and Brian Vinter. 2014. An efficient GPU general sparse matrix-matrix multiplication for irregular data. In IPDPS. IEEE, 370--381.
[22]
Kiran Matam, Siva Rama Krishna Bharadwaj Indarapu, and Kishore Kothapalli. 2012. Sparse Matrix-Matrix Multiplication on Modern Architectures. In HiPC. IEEE.
[23]
John D. McCalpin. 1991-2007. STREAM: Sustainable Memory Bandwidth in High Performance Computers. Technical Report. University of Virginia.
[24]
Johannes Sebastian Mueller-Roemer, Christian Altenhofen, and Andre Stork. 2017. Ternary Sparse Matrix Representation for Volumetric Mesh Subdivision and Processing on GPUs. In Computer Graphics Forum, Vol. 36.
[25]
Yusuke Nagasaka, Satoshi Matsuoka, Ariful Azad, and Aydın Buluç. 2018. High-performance sparse matrix-matrix products on Intel KNL and multicore architectures. arXiv preprint arXiv:1804.01698 (2018).
[26]
Yusuke Nagasaka, Akira Nukada, and Satoshi Matsuoka. 2017. High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU. In ICPP. IEEE, 101--110.
[27]
Md Mostofa Ali Patwary, Nadathur Rajagopalan Satish, Narayanan Sundaram, Jongsoo Park, Michael J Anderson, Satya Gautam Vadlamudi, Dipankar Das, Sergey G Pudov, Vadim O Pirogov, and Pradeep Dubey. 2015. Parallel efficient sparse matrix-matrix multiplication on multicore platforms. In ISC. Springer, 48--57.
[28]
Usha Nandini Raghavan, Réka Albert, and Soundar Kumara. 2007. Near linear time algorithm to detect community structures in large-scale networks. Physical review E 76, 3 (2007), 036106.
[29]
Kenneth A Ross. 2007. Efficient Hash Probes on Modern Processors. In ICDE. IEEE, 1297--1301.
[30]
Karl Rupp, Philippe Tillet, Florian Rudolf, Josef Weinbub, Andreas Morhammer, Tibor Grasser, Ansgar Jüngel, and Siegfried Selberherr. 2016. ViennaCL---Linear Algebra Library for Multi-and Many-Core Architectures. SIAM Journal on Scientific Computing 38, 5 (2016), S412--S439.
[31]
Viral B. Shah. 2007. An Interactive System for Combinatorial Scientific Computing with an Emphasis on Programmer Productivity. Ph.D. Dissertation. University of California, Santa Barbara.
[32]
Peter D Sulatycke and Kanad Ghose. 1998. Caching-efficient multithreaded fast multiplication of sparse matrices. In IPPS/SPDP. IEEE.

Cited By

View all
  • (2024)POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table DesignProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638500(457-459)Online publication date: 2-Mar-2024
  • (2024)SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPUParallel Processing Letters10.1142/S012962642450004X34:02Online publication date: 27-May-2024
  • (2024)Secure and efficient general matrix multiplication on cloud using homomorphic encryptionThe Journal of Supercomputing10.1007/s11227-024-06428-880:18(26394-26434)Online publication date: 26-Aug-2024
  • Show More Cited By

Index Terms

  1. High-Performance Sparse Matrix-Matrix Products on Intel KNL and Multicore Architectures

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Other conferences
    ICPP Workshops '18: Workshop Proceedings of the 47th International Conference on Parallel Processing
    August 2018
    409 pages
    ISBN:9781450365239
    DOI:10.1145/3229710
    Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

    In-Cooperation

    • University of Oregon: University of Oregon

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 13 August 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Intel KNL
    2. SpGEMM
    3. Sparse matrix

    Qualifiers

    • Research-article
    • Research
    • Refereed limited

    Conference

    ICPP '18 Comp

    Acceptance Rates

    Overall Acceptance Rate 91 of 313 submissions, 29%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)51
    • Downloads (Last 6 weeks)1
    Reflects downloads up to 11 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)POSTER: Optimizing Sparse Tensor Contraction with Revisiting Hash Table DesignProceedings of the 29th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming10.1145/3627535.3638500(457-459)Online publication date: 2-Mar-2024
    • (2024)SPMSD: An Partitioning-Strategy for Parallel General Sparse Matrix-Matrix Multiplication on GPUParallel Processing Letters10.1142/S012962642450004X34:02Online publication date: 27-May-2024
    • (2024)Secure and efficient general matrix multiplication on cloud using homomorphic encryptionThe Journal of Supercomputing10.1007/s11227-024-06428-880:18(26394-26434)Online publication date: 26-Aug-2024
    • (2023)A New Sparse GEneral Matrix-matrix Multiplication Method for Long Vector Architecture by Hierarchical Row MergingProceedings of the SC '23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis10.1145/3624062.3625131(756-759)Online publication date: 12-Nov-2023
    • (2023)A Tensor Marshaling Unit for Sparse Tensor Algebra on General-Purpose ProcessorsProceedings of the 56th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3613424.3614284(1332-1346)Online publication date: 28-Oct-2023
    • (2023)SAGE: A Storage-Based Approach for Scalable and Efficient Sparse Generalized Matrix-Matrix MultiplicationProceedings of the 32nd ACM International Conference on Information and Knowledge Management10.1145/3583780.3615044(923-933)Online publication date: 21-Oct-2023
    • (2023)Algorithm 1037: SuiteSparse:GraphBLAS: Parallel Graph Algorithms in the Language of Sparse Linear AlgebraACM Transactions on Mathematical Software10.1145/357719549:3(1-30)Online publication date: 19-Sep-2023
    • (2023)Fast Sparse GPU Kernels for Accelerated Training of Graph Neural Networks2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00057(501-511)Online publication date: May-2023
    • (2023)DeltaSPARSE: High-Performance Sparse General Matrix-Matrix Multiplication on Multi-GPU Systems2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC58850.2023.00037(194-202)Online publication date: 18-Dec-2023
    • (2023)Optimizing massively parallel sparse matrix computing on ARM many-core processorParallel Computing10.1016/j.parco.2023.103035117(103035)Online publication date: Sep-2023
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media