Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

CSX: an extended compression format for spmv on shared memory systems

Published: 12 February 2011 Publication History

Abstract

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernel's performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.

References

[1]
R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing'92, pages 32--41, Minn., MN, November 1992. IEEE.
[2]
W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Achieving high sustained performance in an unstructured mesh CFD application. In SC'99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, page 69, New York, NY, USA, 1999. ACM.
[3]
K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18 2006.
[4]
R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.
[5]
M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient smvm kernels. In ICS'09: Proceedings of the 23rd international conference on Supercomputing, pages 100--109, New York, NY, USA, 2009. ACM.
[6]
U. V. Catalyuerek and C. Aykanat. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. Lecture Notes In Computer Science, 1117:75--86, 1996.
[7]
T. Davis. University of Florida sparse matrix collection. NA Digest, 97(23):7, 1997.
[8]
R. Geus and S. Röllin. Towards a fast parallel sparse matrix-vector multiplication. In Parallel Computing: Fundamentals and Applications, International Conference ParCo, pages 308--315. Imperial College Press, 1999.
[9]
G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. The Journal of Supercomputing, 2008.
[10]
E. Im and K. Yelick. Optimizing sparse matrix-vector multiplication on SMPs. In 9th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.
[11]
E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. Lecture Notes in Computer Science, 2073:127--136, 2001.
[12]
V. Karakasis, G. Goumas, and N. Koziris. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 12th IEEE International Conference on Computational Science and Engineering (CSE-09), Vancouver, Canada, 2009. IEEE Computer Society.
[13]
D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. Technical Report UWCSE 91-11-04, University of Washington Department of Computer Science and Engineering, November 1991.
[14]
K. Kourtis, G. Goumas, and N. Koziris. Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. In 37th International Conference on Parallel Processing (ICPP'08), pages 511--519, Sept. 2008.
[15]
K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In CF'08: Proceedings of the 2008 conference on Computing frontiers, pages 87--96, New York, NY, USA, 2008. ACM.
[16]
C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.
[17]
J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225, 2004.
[18]
J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera. Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In PDP, pages 66--71. IEEE Computer Society, 2004.
[19]
A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Supercomputing'99, Portland, OR, November 1999. ACM SIGARCH and IEEE.
[20]
Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, Minneapolis, MN 55455, June 1994. Version 2.
[21]
Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, USA, 2003.
[22]
S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--725, 1997.
[23]
R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrix-vector multiply. In Supercomputing, Baltimore, MD, November 2002.
[24]
R. W. Vuduc and H. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communications, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005.
[25]
J. White and P. Sadayappan. On improving the performance of sparse matrix-vector multiplication. In HiPC'97: 4th International Conference on High Performance Computing, 1997.
[26]
J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS'06: Proceedings of the 20th annual International Conference on Supercomputing, pages 307--316, New York, NY, USA, 2006. ACM Press.
[27]
S. Williams, L. Oilker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, NV, November 2007.

Cited By

View all
  • (2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
  • (2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
  • (2023)Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00262(1912-1919)Online publication date: 17-Dec-2023
  • Show More Cited By

Index Terms

  1. CSX: an extended compression format for spmv on shared memory systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM SIGPLAN Notices
    ACM SIGPLAN Notices  Volume 46, Issue 8
    PPoPP '11
    August 2011
    300 pages
    ISSN:0362-1340
    EISSN:1558-1160
    DOI:10.1145/2038037
    Issue’s Table of Contents
    • cover image ACM Conferences
      PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
      February 2011
      326 pages
      ISBN:9781450301190
      DOI:10.1145/1941553
      • General Chair:
      • Calin Cascaval,
      • Program Chair:
      • Pen-Chung Yew
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 12 February 2011
    Published in SIGPLAN Volume 46, Issue 8

    Check for updates

    Author Tags

    1. compression
    2. shared memory
    3. smp
    4. sparse matrix-vector multiplication
    5. spmv

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)38
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 22 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
    • (2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
    • (2023)Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00262(1912-1919)Online publication date: 17-Dec-2023
    • (2023)Optimizing Compression Schemes for Parallel Sparse Tensor Algebra2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363624(1-7)Online publication date: 25-Sep-2023
    • (2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
    • (2023)Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00056(535-544)Online publication date: May-2023
    • (2022)AlphaSparseProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571972(1-15)Online publication date: 13-Nov-2022
    • (2022)pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage ScoresACM Transactions on Mathematical Software10.1145/355537048:4(1-27)Online publication date: 19-Dec-2022
    • (2022)AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse MatricesSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00071(1-15)Online publication date: Nov-2022
    • (2022)A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00038(215-225)Online publication date: Dec-2022
    • Show More Cited By

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media