Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Adaptive sparse matrix representation for efficient matrix–vector multiplication

  • Published:
The Journal of Supercomputing Aims and scope Submit manuscript

Abstract

A wide range of applications in engineering and scientific computing are based on the sparse matrix computation. There exist a variety of data representations to keep the non-zero elements in sparse matrices, and each representation favors some matrices while not working well for some others. The existing studies tend to process all types of applications, e.g., the most popular application which is matrix–vector multiplication, with different sparse matrix structures using a fixed representation. While Graphics Processing Units (GPUs) have evolved into a very attractive platform for general purpose computations, most of the existing works on sparse matrix–vector multiplication (SpMV, for short) consider CPUs. In this work, we design and implement an adaptive GPU-based SpMV scheme that selects the best format for the input matrix having the configuration and characteristics of GPUs in mind. We study the effect of various parameters and different settings on the performance of SpMV applications when employing different data representations. We then employ an adaptive scheme to execute different sparse matrix applications using proper sparse matrix representation formats. Evaluation results show that our run-time adaptive scheme properly adapts to different applications by selecting an appropriate representation for each input sparse matrix. The preliminary results show that our adaptive scheme improves the performance of sparse matrix multiplications by 2.1\(\times \) for single-precision and 1.6\(\times \) for double-precision formats, on average.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12

Similar content being viewed by others

References

  1. Asanovic K et al (2006) The landscape of parallel computing research: a view from Berkeley, vol 2. Technical Report UCB/EECS-2006-183, EECS Department, University of California, Berkeley

  2. Page L, Brin S, Motwani R, Winograd T (1998) The PageRank citation ranking: bringing order to the web. Technical report, Stanford Digital Library Technologies Project

  3. Saad Y (2003) Iterative methods for sparse linear systems, 2nd edn. SIAM, Philadelphia

  4. Bell N, Garland M (2009) Implementing sparse matrix–vector multiplication on throughput-oriented processors. In: Proceedings of the conference on high performance computing networking, storage and analysis. ACM

  5. Baskaran MM, Bordawekar R (2008) Optimizing sparse matrix–vector multiplication on GPUs using compile-time and run-time strategies. IBM Research Report, RC24704 (W0812-047)

  6. Baskaran MM, Bordawekar R (2009) Sparse matrix–vector multiplication toolkit for graphics processing units. http://www.alphaworks.ibm.com/tech/spmv4gpu

  7. Monakov A, Lokhmotov A, Avetisyan A (2010) Automatically tuning sparse matrix–vector multiplication for GPU architectures. In: High performance embedded architectures and compilers. Springer, Berlin

  8. Monakov A (May 2012) Specialized sparse matrix formats and SpMV kernel tuning for GPUs. In: Proceedings of the GPU technology conference (GTC)

  9. Choi JW, Singh A, Vuduc RW (2010) Model-driven autotuning of sparse matrix–vector multiply on GPUs. In: ACM sigplan notices, vol 45, no 5. ACM

  10. Grewe D, Lokhmotov A (2011) Automatically generating and tuning GPU code for sparse matrix–vector multiplication from a high-level representation. In: Proceedings of the fourth workshop on general purpose processing on graphics processing units. ACM

  11. Reguly I, Giles M (2012) Efficient sparse matrix–vector multiplication on cache-based GPUs. In: Innovative parallel computing (InPar), 2012. IEEE

  12. Vázquez F, Fernández JJ, Garzón EM (2011) A new approach for sparse matrix vector product on NVIDIA GPUs. Concurr. Comput. Pract. Exp. 23(8):815–826

    Article  Google Scholar 

  13. Yan S et al (2014) yaspmv: Yet another SpMV framework on GPUs. In: ACM SIGPLAN notices, vol 49, no 8. ACM

  14. Ashari A et al (2014) An efficient two-dimensional blocking strategy for sparse matrix–vector multiplication on GPUs. In: Proceedings of the 28th ACM international conference on supercomputing. ACM

  15. Zheng C et al (2014) BiELL: a bisection ELLPACK-based storage format for optimizing SpMV on GPUs. J Parallel Distrib Comput 74(7):2639–2647

    Article  Google Scholar 

  16. Yan CC et al (2015) Memory bandwidth optimization of SpMV on GPGPUs. Front Comput Sci 9(3):431–441

    Article  Google Scholar 

  17. Guo P, Wang L (2014) Accurate cross-architecture performance modeling for sparse matrix-vector multiplication (SpMV) on GPUs. Concurr Comput Pract Exp 27(13):3281–3294. doi:10.1002/cpe.3217

    Article  Google Scholar 

  18. Li K, Yang W, Li K (2015) Performance analysis and optimization for SpMV on GPU using probabilistic modeling. IEEE Trans Parallel Distrib Syst 26(1):196–205

    Article  Google Scholar 

  19. Neelima B, Ram Mohana Reddy G, Raghavendra Prakash S (2014) Predicting an Optimal Sparse Matrix Format for SpMV Computation on GPU. In: Parallel and distributed processing symposium workshops (IPDPSW), 2014 IEEE international. IEEE

  20. Sedaghati N, Mu T, Pouchet L-N, Parthasarathy S, Sadayappan P (2015) Automatic selection of sparse matrix representation on GPUs. In: Proceedings of the 29th ACM international conference on supercomputing. ACM

  21. Vuduc RW (2003) Automatic performance tuning of sparse matrix kernels, Diss. University of California, Berkeley

  22. Williams S et al (2009) Optimization of sparse matrix–vector multiplication on emerging multicore platforms. Parallel Comput 35(3):178–194

    Article  Google Scholar 

  23. Williams S (2008) Webb. Auto-tuning performance on multicore computers, ProQuest

  24. Vuduc RW, Demmel JW, Yelick KA (2005) OSKI: a library of automatically tuned sparse matrix kernels. In: Journal of Physics: conference series, vol 16, no 1. IOP Publishing

  25. Bilmes J et al (1997) Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology. In: Proceedings of the 11th international conference on supercomputing. ACM

  26. Whaley RC, Dongarra JJ (1998) Automatically tuned linear algebra software. In: Proceedings of the 1998 ACM/IEEE conference on supercomputing. IEEE Computer Society

  27. Im EJ, Yelick KA, Vuduc RW (2004) Sparsity: optimization framework for sparse matrix kernels. Int J High Perform Comput Appl 18(1):135–158

    Article  Google Scholar 

  28. Li J et al (2013) SMAT: an input adaptive auto-tuner for sparse matrix–vector multiplication. In: ACM SIGPLAN notices, vol 48, no 6. ACM

  29. Vuduc RW, Moon HJ (2005) Fast sparse matrix–vector multiplication by exploiting variable block structure. In: High performance computing and communications. Springer, Berlin

  30. Ogielski AT, Aiello W (1993) Sparse matrix computations on parallel processor arrays. SIAM J Sci Comput 14(3):519–530

    Article  MathSciNet  MATH  Google Scholar 

  31. Lee BC et al (2004) Performance models for evaluation and automatic tuning of symmetric sparse matrix–vector multiply. In: International conference on parallel processing, 2004. ICPP 2004. IEEE

  32. Im EJ, Yelick KA (2000) Optimizing the performance of sparse matrix–vector multiplication. University of California, Berkeley

    Google Scholar 

  33. Kourtis K et al (2011) CSX: an extended compression format for spmv on shared memory systems. In: ACM SIGPLAN notices, vol 46, no 8. ACM

  34. Liu W, Vinter B (2015) Csr5: an efficient storage format for cross-platform sparse matrix–vector multiplication. In: Proceedings of the 29th ACM international conference on supercomputing. ACM

  35. NVIDA (2014) Whitepaper NVIDIAs next generation CUDA compute architecture: Kepler GK110/210

  36. NVIDIA Corporation (2014) Tuning CUDA applications for Kepler. Technical report, August 2014. http://docs.nvidia.com/cuda/pdf/Kepler_Tuning_Guide.pdf

  37. NVIDIA CUDA (2010) NVIDIA CUDA C programming guide, Version 3.1. http://developer.download.nvidia.com/compute/cuda/3_1/toolkit/docs/NVIDIA_CUDA_C_ProgrammingGuide_3.1.pdf. Accessed 4 May 2011

  38. Davis TA, Hu Y (2011) The University of Florida sparse matrix collection. ACM Trans Math Softw (TOMS) 38(1):1

    MathSciNet  Google Scholar 

  39. NVIDIA (2013) Compute visual profiler user guide. http://developer.download.nvidia.com/compute/DevZone/docs/html/C/doc/Compute-Visual-Profiler-User-Guide.Pdf

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Pantea Zardoshti.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (pdf 2309 KB)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zardoshti, P., Khunjush, F. & Sarbazi-Azad, H. Adaptive sparse matrix representation for efficient matrix–vector multiplication. J Supercomput 72, 3366–3386 (2016). https://doi.org/10.1007/s11227-015-1571-0

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11227-015-1571-0

Keywords