Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

Monakov, Alexander; Avetisyan, Arutyun

doi:10.1007/978-3-642-03138-0_32

Alexander Monakov¹⁹ &
Arutyun Avetisyan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 5657))

Included in the following conference series:

International Workshop on Embedded Computer Systems

873 Accesses
15 Citations

Abstract

We discuss implementing blocked sparse matrix-vector multiplication for NVIDIA GPUs. We outline an algorithm and various optimizations, and identify potential future improvements and challenging tasks. In comparison with previously published implementation, our implementation is faster on matrices having many high fill-ratio blocks but slower on matrices with low number of non-zero elements per row.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs

References

NVIDIA Corporation: NVIDIA CUDA Programming Guide 2.1 (2008)
Google Scholar
Volkov, V., Demmel, J.W.: Benchmarking GPUs to tune dense linear algebra. In: SC 2008: Proceedings of the 2008 ACM/IEEE conference on Supercomputing, pp. 1–11. IEEE Press, Piscataway (2008)
Google Scholar
Collange, S., Defour, D., Parello, D.: Barra, a modular functional GPU simulator for GPGPU. Technical report, CCSd/HAL: e-articles server (based on gBUS), France (2009), http://hal.ccsd.cnrs.fr/oai/oai.php
Vuduc, R.W.: Automatic performance tuning of sparse matrix kernels. Technical report (2003)
Google Scholar
Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of SC 2007 (2007)
Google Scholar
Bell, N., Garland, M.: Efficient sparse matrix-vector multiplication on CUDA. NVIDIA Technical Report NVR-2008-004 (2008)
Google Scholar
Buatois, L., Caumon, G., Lévy, B.: Concurrent number cruncher: An efficient sparse linear solver on the GPU. In: Perrott, R., Chapman, B.M., Subhlok, J., de Mello, R.F., Yang, L.T. (eds.) HPCC 2007. LNCS, vol. 4782, pp. 358–371. Springer, Heidelberg (2007)
Chapter Google Scholar
Davis, T.A.: University of Florida sparse matrix collection. NA Digest 92 (1994)
Google Scholar

Download references

Author information

Authors and Affiliations

Institute for System Programming of RAS, Moscow, Russia
Alexander Monakov & Arutyun Avetisyan

Authors

Alexander Monakov
View author publications
You can also search for this author in PubMed Google Scholar
Arutyun Avetisyan
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Delft University of Technology, Mekelweg 4, 2628, Delft, CD, The Netherlands
Koen Bertels & Stephan Wong &
Department of Electrical and Computer Engineering, University of Victoria, P.O. Box 3055, V8W 3P6, Victoria, BC, Canada
Nikitas Dimopoulos
Dipartimento di Elettronica e Informazione, Politecnico di Milano, P.za Leonardo Da Vinci 32, 20133, Milan, Italy
Cristina Silvano

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Monakov, A., Avetisyan, A. (2009). Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs. In: Bertels, K., Dimopoulos, N., Silvano, C., Wong, S. (eds) Embedded Computer Systems: Architectures, Modeling, and Simulation. SAMOS 2009. Lecture Notes in Computer Science, vol 5657. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-03138-0_32

Download citation

DOI: https://doi.org/10.1007/978-3-642-03138-0_32
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-03137-3
Online ISBN: 978-3-642-03138-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Implementing Blocked Sparse Matrix-Vector Multiplication on NVIDIA GPUs

Abstract

Access this chapter

Subscribe and save

Buy Now

Preview

Similar content being viewed by others

Dense and Sparse Matrix-Vector Multiplication on Maxwell GPUs with PyCUDA

Performance Characteristics for Sparse Matrix-Vector Multiplication on GPUs

Performance Engineering for a Tall & Skinny Matrix Multiplication Kernels on GPUs

References

Author information

Authors and Affiliations

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation