Abstract
A wide range of applications in engineering and scientific computing are based on the sparse matrix computation. There exist a variety of data representations to keep the non-zero elements in sparse matrices, and each representation favors some matrices while not working well for some others. The existing studies tend to process all types of applications, e.g., the most popular application which is matrix–vector multiplication, with different sparse matrix structures using a fixed representation. While Graphics Processing Units (GPUs) have evolved into a very attractive platform for general purpose computations, most of the existing works on sparse matrix–vector multiplication (SpMV, for short) consider CPUs. In this work, we design and implement an adaptive GPU-based SpMV scheme that selects the best format for the input matrix having the configuration and characteristics of GPUs in mind. We study the effect of various parameters and different settings on the performance of SpMV applications when employing different data representations. We then employ an adaptive scheme to execute different sparse matrix applications using proper sparse matrix representation formats. Evaluation results show that our run-time adaptive scheme properly adapts to different applications by selecting an appropriate representation for each input sparse matrix. The preliminary results show that our adaptive scheme improves the performance of sparse matrix multiplications by 2.1\(\times \) for single-precision and 1.6\(\times \) for double-precision formats, on average.
Similar content being viewed by others
References
Asanovic K et al (2006) The landscape of parallel computing research: a view from Berkeley, vol 2. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley
Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project
Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia
Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM
Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix–vector multiplication on GPUs using compile-time and run-time strategies. IBM Research Report, RC24704 (W0812-047)
Baskaran MM, Bordawekar R (2009) Sparse matrix–vector multiplication toolkit for graphics processing units. http://www.alphaworks.ibm.com/tech/spmv4gpu
Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix–vector multiplication for GPU architectures. In: High performance embedded architectures and compilers. Springer, Berlin
Monakov A (May 2012) Specialized sparse matrix formats and SpMV kernel tuning for GPUs. In: Proceedings of the GPU technology conference (GTC)
Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix–vector multiply on GPUs. In: ACM sigplan notices, vol 45, no 5. ACM
Grewe D, Lokhmotov A (2011) Automatically generating and tuning GPU code for sparse matrix–vector multiplication from a high-level representation. In: Proceedings of the fourth workshop on general purpose processing on graphics processing units. ACM
Reguly I, Giles M (2012) Efficient sparse matrix–vector multiplication on cache-based GPUs. In: Innovative parallel computing (InPar), 2012. IEEE
Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr. Comput. Pract. Exp. 23(8):815–826
Yan S et al (2014) yaspmv: Yet another SpMV framework on GPUs. In: ACM SIGPLAN notices, vol 49, no 8. ACM
Ashari A et al (2014) An efficient two-dimensional blocking strategy for sparse matrix–vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on supercomputing. ACM
Zheng C et al (2014) BiELL: a bisection ELLPACK-based storage format for optimizing SpMV on GPUs. J Parallel Distrib Comput 74(7):2639–2647
Yan CC et al (2015) Memory bandwidth optimization of SpMV on GPGPUs. Front Comput Sci 9(3):431–441
Guo P, Wang L (2014) Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217
Li K, Yang W, Li K (2015) Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Trans Parallel Distrib Syst 26(1):196–205
Neelima B, Ram Mohana Reddy G, Raghavendra Prakash S (2014) Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU. In: Parallel and distributed processing symposium workshops (IPDPSW), 2014 IEEE international. IEEE
Sedaghati N, Mu T, Pouchet L-N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM international conference on supercomputing. ACM
Vuduc RW (2003) Automatic performance tuning of sparse matrix kernels, Diss. University of California, Berkeley
Williams S et al (2009) Optimization of sparse matrix–vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194
Williams S (2008) Webb. Auto-tuning performance on multicore computers, ProQuest
Vuduc RW, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. In: Journal of Physics: conference series, vol 16, no 1. IOP Publishing
Bilmes J et al (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on supercomputing. ACM
Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE conference on supercomputing. IEEE Computer Society
Im EJ, Yelick KA, Vuduc RW (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158
Li J et al (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. In: ACM SIGPLAN notices, vol 48, no 6. ACM
Vuduc RW, Moon HJ (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: High performance computing and communications. Springer, Berlin
Ogielski AT, Aiello W (1993) Sparse matrix computations on parallel processor arrays. SIAM J Sci Comput 14(3):519–530
Lee BC et al (2004) Performance models for evaluation and automatic tuning of symmetric sparse matrix–vector multiply. In: International conference on parallel processing, 2004. ICPP 2004. IEEE
Im EJ, Yelick KA (2000) Optimizing the performance of sparse matrix–vector multiplication. University of California, Berkeley
Kourtis K et al (2011) CSX: an extended compression format for spmv on shared memory systems. In: ACM SIGPLAN notices, vol 46, no 8. ACM
Liu W, Vinter B (2015) Csr5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM international conference on supercomputing. ACM
NVIDA (2014) Whitepaper NVIDIAs next generation CUDA compute architecture: Kepler GK110/210
NVIDIA Corporation (2014) Tuning CUDA applications for Kepler. Technical report, August 2014. http://docs.nvidia.com/cuda/pdf/Kepler_Tuning_Guide.pdf
NVIDIA CUDA (2010) NVIDIA CUDA C programming guide, Version 3.1. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf. Accessed 4 May 2011
Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1
NVIDIA (2013) Compute visual profiler user guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute-Visual-Profiler-User-Guide.Pdf
Author information
Authors and Affiliations
Corresponding author
Electronic supplementary material
Below is the link to the electronic supplementary material.
Rights and permissions
About this article
Cite this article
Zardoshti, P., Khunjush, F. & Sarbazi-Azad, H. Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput 72, 3366–3386 (2016). https://doi.org/10.1007/s11227-015-1571-0
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11227-015-1571-0