research-article

CSX: an extended compression format for spmv on shared memory systems

Authors:

Kornilios Kourtis,

Vasileios Karakasis,

Georgios Goumas,

Nectarios KozirisAuthors Info & Claims

ACM SIGPLAN Notices, Volume 46, Issue 8

Pages 247 - 256

https://doi.org/10.1145/2038037.1941587

Published: 12 February 2011 Publication History

Abstract

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve the kernel's performance is to drastically reduce the data volume involved in the computations. Since the storage formats for sparse matrices include metadata describing the structure of non-zero elements within the matrix, we propose a generalized approach to compress metadata by exploiting substructures within the matrix. We call the proposed storage format Compressed Sparse eXtended (CSX). In our implementation we employ runtime code generation to construct specialized SpMV routines for each matrix. Experimental evaluation on two shared memory systems for 15 sparse matrices demonstrates significant performance gains as the number of participating cores increases. Regarding the cost of CSX construction, we propose several strategies which trade performance for preprocessing cost making CSX applicable both to online and offline preprocessing.

References

[1]

R. C. Agarwal, F. G. Gustavson, and M. Zubair. A high performance algorithm using pre-processing for the sparse matrix-vector multiplication. In Supercomputing'92, pages 32--41, Minn., MN, November 1992. IEEE.

Digital Library

[2]

W. K. Anderson, W. D. Gropp, D. K. Kaushik, D. E. Keyes, and B. F. Smith. Achieving high sustained performance in an unstructured mesh CFD application. In SC'99: Proceedings of the 1999 ACM/IEEE Conference on Supercomputing, page 69, New York, NY, USA, 1999. ACM.

Digital Library

[3]

K. Asanovic, R. Bodik, B. C. Catanzaro, J. J. Gebis, P. Husbands, K. Keutzer, D. A. Patterson, W. L. Plishker, J. Shalf, S. W. Williams, and K. A. Yelick. The landscape of parallel computing research: A view from berkeley. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley, December 18 2006.

[4]

R. Barrett, M. Berry, T. F. Chan, J. Demmel, J. M. Donato, J. Dongarra, V. Eijkhout, R. Pozo, C. Romine, and H. V. der Vorst. Templates for the Solution of Linear Systems: Building Blocks for Iterative Methods. SIAM, Philadelphia, 1994.

[5]

M. Belgin, G. Back, and C. J. Ribbens. Pattern-based sparse matrix representation for memory-efficient smvm kernels. In ICS'09: Proceedings of the 23rd international conference on Supercomputing, pages 100--109, New York, NY, USA, 2009. ACM.

Digital Library

[6]

U. V. Catalyuerek and C. Aykanat. Decomposing irregularly sparse matrices for parallel matrix-vector multiplication. Lecture Notes In Computer Science, 1117:75--86, 1996.

Digital Library

[7]

T. Davis. University of Florida sparse matrix collection. NA Digest, 97(23):7, 1997.

[8]

R. Geus and S. Röllin. Towards a fast parallel sparse matrix-vector multiplication. In Parallel Computing: Fundamentals and Applications, International Conference ParCo, pages 308--315. Imperial College Press, 1999.

[9]

G. Goumas, K. Kourtis, N. Anastopoulos, V. Karakasis, and N. Koziris. Performance evaluation of the sparse matrix-vector multiplication on modern architectures. The Journal of Supercomputing, 2008.

Digital Library

[10]

E. Im and K. Yelick. Optimizing sparse matrix-vector multiplication on SMPs. In 9th SIAM Conference on Parallel Processing for Scientific Computing. SIAM, March 1999.

[11]

E. Im and K. Yelick. Optimizing sparse matrix computations for register reuse in SPARSITY. Lecture Notes in Computer Science, 2073:127--136, 2001.

Digital Library

[12]

V. Karakasis, G. Goumas, and N. Koziris. A comparative study of blocking storage methods for sparse matrices on multicore architectures. In 12th IEEE International Conference on Computational Science and Engineering (CSE-09), Vancouver, Canada, 2009. IEEE Computer Society.

Digital Library

[13]

D. Keppel, S. J. Eggers, and R. R. Henry. A case for runtime code generation. Technical Report UWCSE 91-11-04, University of Washington Department of Computer Science and Engineering, November 1991.

[14]

K. Kourtis, G. Goumas, and N. Koziris. Improving the performance of multithreaded sparse matrix-vector multiplication using index and value compression. In 37th International Conference on Parallel Processing (ICPP'08), pages 511--519, Sept. 2008.

Digital Library

[15]

K. Kourtis, G. Goumas, and N. Koziris. Optimizing sparse matrix-vector multiplication using index and value compression. In CF'08: Proceedings of the 2008 conference on Computing frontiers, pages 87--96, New York, NY, USA, 2008. ACM.

Digital Library

[16]

C. Lattner and V. Adve. LLVM: A Compilation Framework for Lifelong Program Analysis & Transformation. In Proceedings of the 2004 International Symposium on Code Generation and Optimization (CGO'04), Palo Alto, California, Mar 2004.

Digital Library

[17]

J. Mellor-Crummey and J. Garvin. Optimizing sparse matrix-vector product computations using unroll and jam. International Journal of High Performance Computing Applications, 18(2):225, 2004.

Digital Library

[18]

J. C. Pichel, D. B. Heras, J. C. Cabaleiro, and F. F. Rivera. Improving the locality of the sparse matrix-vector product on shared memory multiprocessors. In PDP, pages 66--71. IEEE Computer Society, 2004.

[19]

A. Pinar and M. T. Heath. Improving performance of sparse matrix-vector multiplication. In Supercomputing'99, Portland, OR, November 1999. ACM SIGARCH and IEEE.

Digital Library

[20]

Y. Saad. SPARSKIT: A basic tool kit for sparse matrix computations. Technical report, Computer Science Department, University of Minnesota, Minneapolis, MN 55455, June 1994. Version 2.

[21]

Y. Saad. Iterative Methods for Sparse Linear Systems. SIAM, Philadelphia, PA, USA, 2003.

Digital Library

[22]

S. Toledo. Improving the memory-system performance of sparse-matrix vector multiplication. IBM Journal of Research and Development, 41(6):711--725, 1997.

Digital Library

[23]

R. Vuduc, J. Demmel, K. Yelick, S. Kamil, R. Nishtala, and B. Lee. Performance optimizations and bounds for sparse matrix-vector multiply. In Supercomputing, Baltimore, MD, November 2002.

Digital Library

[24]

R. W. Vuduc and H. Moon. Fast sparse matrix-vector multiplication by exploiting variable block structure. In High Performance Computing and Communications, volume 3726 of Lecture Notes in Computer Science, pages 807--816. Springer, 2005.

Digital Library

[25]

J. White and P. Sadayappan. On improving the performance of sparse matrix-vector multiplication. In HiPC'97: 4th International Conference on High Performance Computing, 1997.

Digital Library

[26]

J. Willcock and A. Lumsdaine. Accelerating sparse matrix computations via data compression. In ICS'06: Proceedings of the 20th annual International Conference on Supercomputing, pages 307--316, New York, NY, USA, 2006. ACM Press.

Digital Library

[27]

S. Williams, L. Oilker, R. Vuduc, J. Shalf, K. Yelick, and J. Demmel. Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In SC '07: Proceedings of the 2007 ACM/IEEE conference on Supercomputing, Reno, NV, November 2007.

Digital Library

Cited By

HU SITO MYOSHIKAWA THE YNAKAMURA HKONDO M(2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
https://doi.org/10.1587/transinf.2023PAP0008
Mpakos PGalanopoulos DAnastasiadis PPapadopoulou NKoziris NGoumas G(2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00072
Bi DTian XLi SDong D(2023)Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00262(1912-1919)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00262
Show More Cited By

Index Terms

CSX: an extended compression format for spmv on shared memory systems
1. Applied computing
  1. Physical sciences and engineering

Recommendations

CSX: an extended compression format for spmv on shared memory systems
PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming

The Sparse Matrix-Vector multiplication (SpMV) kernel scales poorly on shared memory systems with multiple processing units due to the streaming nature of its data access pattern. Previous research has demonstrated that an effective strategy to improve ...
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13: Proceedings of the 34th ACM SIGPLAN Conference on Programming Language Design and Implementation

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...
SMAT: an input adaptive auto-tuner for sparse matrix-vector multiplication
PLDI '13

Sparse Matrix Vector multiplication (SpMV) is an important kernel in both traditional high performance computing and emerging data-intensive applications. By far, SpMV libraries are optimized by either application-specific or architecture-specific ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGPLAN Notices

ACM SIGPLAN Notices Volume 46, Issue 8

PPoPP '11

August 2011

300 pages

ISSN:0362-1340

EISSN:1558-1160

DOI:10.1145/2038037

Issue’s Table of Contents

PPoPP '11: Proceedings of the 16th ACM symposium on Principles and practice of parallel programming
February 2011
326 pages
ISBN:9781450301190
DOI:10.1145/1941553
General Chair:
Calin Cascaval
Qualcomm Research, USA
,
Program Chair:
Pen-Chung Yew
Academia Sinica, Taiwan and University of Minnesota at Twin Cities, USA

Copyright © 2011 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 February 2011

Published in SIGPLAN Volume 46, Issue 8

Check for updates

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

81
Total Citations
View Citations
540
Total Downloads

Downloads (Last 12 months)38
Downloads (Last 6 weeks)3

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

HU SITO MYOSHIKAWA THE YNAKAMURA HKONDO M(2023)Adaptive Lossy Data Compression Extended Architecture for Memory Bandwidth Conservation in SpMVIEICE Transactions on Information and Systems10.1587/transinf.2023PAP0008E106.D:12(2015-2025)Online publication date: 1-Dec-2023
https://doi.org/10.1587/transinf.2023PAP0008
Mpakos PGalanopoulos DAnastasiadis PPapadopoulou NKoziris NGoumas G(2023)Feature-based SpMV Performance Analysis on Contemporary Devices2023 IEEE International Parallel and Distributed Processing Symposium (IPDPS)10.1109/IPDPS54959.2023.00072(668-679)Online publication date: May-2023
https://doi.org/10.1109/IPDPS54959.2023.00072
Bi DTian XLi SDong D(2023)Efficiently Running SpMV on Multi-Core DSPs for Block Sparse Matrix2023 IEEE 29th International Conference on Parallel and Distributed Systems (ICPADS)10.1109/ICPADS60453.2023.00262(1912-1919)Online publication date: 17-Dec-2023
https://doi.org/10.1109/ICPADS60453.2023.00262
Xu HSchardl TPellauer MEmer J(2023)Optimizing Compression Schemes for Parallel Sparse Tensor Algebra2023 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC58863.2023.10363624(1-7)Online publication date: 25-Sep-2023
https://doi.org/10.1109/HPEC58863.2023.10363624
Li WCheng HLu ZLu YLiu W(2023)HASpMV: Heterogeneity-Aware Sparse Matrix-Vector Multiplication on Modern Asymmetric Multicore Processors2023 IEEE International Conference on Cluster Computing (CLUSTER)10.1109/CLUSTER52292.2023.00025(209-220)Online publication date: 31-Oct-2023
https://doi.org/10.1109/CLUSTER52292.2023.00025
Mi HYu XYu XWu SLiu W(2023)Balancing Computation and Communication in Distributed Sparse Matrix-Vector Multiplication2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)10.1109/CCGrid57682.2023.00056(535-544)Online publication date: May-2023
https://doi.org/10.1109/CCGrid57682.2023.00056
Du ZLi JWang YLi XTan GSun NWolf FShende SCulhane CAlam SJagode H(2022)AlphaSparseProceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis10.5555/3571885.3571972(1-15)Online publication date: 13-Nov-2022
https://dl.acm.org/doi/10.5555/3571885.3571972
Sobczyk AGallopoulos E(2022)pylspack: Parallel Algorithms and Data Structures for Sketching, Column Subset Selection, Regression, and Leverage ScoresACM Transactions on Mathematical Software10.1145/355537048:4(1-27)Online publication date: 19-Dec-2022
https://dl.acm.org/doi/10.1145/3555370
Du ZLi JWang YLi XTan GSun N(2022)AlphaSparse: Generating High Performance SpMV Codes Directly from Sparse MatricesSC22: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41404.2022.00071(1-15)Online publication date: Nov-2022
https://doi.org/10.1109/SC41404.2022.00071
Nguyen TBecchi M(2022)A GPU-accelerated Data Transformation Framework Rooted in Pushdown Transducers2022 IEEE 29th International Conference on High Performance Computing, Data, and Analytics (HiPC)10.1109/HiPC56025.2022.00038(215-225)Online publication date: Dec-2022
https://doi.org/10.1109/HiPC56025.2022.00038
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents