Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Skip to main content

Efficiently Running SpMV on Multi-core DSPs for Banded Matrix

  • Conference paper
  • First Online:
Algorithms and Architectures for Parallel Processing (ICA3PP 2023)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 14491))

Abstract

Sparse matrix-vector multiplication (SpMV) plays a pivotal role in large-scale scientific computing. Despite the increasing use of low-power multicore digital signal processors (DSPs) in high performance computing (HPC) systems, optimizing SpMV on these platforms has been largely overlooked. This paper introduces the FT-M7032, a new CPU-DSP heterogeneous processor multi-core platform for high-performance computing. The FT-M7032 provides programmable memory units at multiple levels, but effectively utilizing these units poses a challenge. To address this, we evaluate the transfer capability between different units to map matrix elements to storage units. Based on our evaluation, we propose an efficient parallel implementation, SpMV_Band, specifically designed for banded matrices. Furthermore, we devise a computation pipeline that optimizes memory access overhead by overlapping data transfers and computations. To evaluate our approach, we compare its performance with a baseline executed on the general-purpose CPU cores of the FT-M7032 heterogeneous platform. Experimental results demonstrate that our techniques achieve a significant speedup of 2.0\(\times \) compared to the competing baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 99.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 129.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Alappat, C., et al.: Performance modeling of streaming kernels and sparse matrix-vector multiplication on A64FX. In: IEEE/ACM PMBS, pp. 1–7. IEEE (2020)

    Google Scholar 

  2. Bell, N., Garland, M.: Implementing sparse matrix-vector multiplication on throughput-oriented processors. In: Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis, pp. 1–11 (2009)

    Google Scholar 

  3. Chen, L., Jiang, P., Agrawal, G.: Exploiting recent SIMD architectural advances for irregular applications. In: IEEE/ACM CGO, pp. 47–58. IEEE (2016)

    Google Scholar 

  4. Chen, S., Fang, J., Xu, C., Wang, Z.: Adaptive hybrid storage format for sparse matrix-vector multiplication on multi-core SIMD CPUs. Appl. Sci. 12(19), 9812 (2022)

    Google Scholar 

  5. Crane, H., Jr., Gibbs, N.E., Poole, W.G., Jr., Stockmeyer, P.K.: Algorithm 508: Matrix bandwidth and profile reduction. ACM Trans. Mathematical Softw. (TOMS) 2(4), 375–377 (1976)

    Article  Google Scholar 

  6. Davis, T., Hu, Y.: The University of Florida sparse matrix collection. ACM Trans. Math. Softw. 38(1), 1:1–1:25 (2011)

    Google Scholar 

  7. Fang, J., Zhang, P., Huang, C., Tang, T., Lu, K., Wang, R., Wang, Z.: Programming bare-metal accelerators with heterogeneous threading models: a case study of matrix-3000. Front. Inf. Technol. Electron. Eng. 24(4), 509–520 (2023)

    Article  Google Scholar 

  8. Gao, Y., Bakos, J.D.: Sparse matrix-vector multiply on the texas instruments c6678 digital signal processor. In: 2013 IEEE 24th ASAP, pp. 168–174. IEEE (2013)

    Google Scholar 

  9. Gao, Y., Zhang, F., Bakos, J.D.: Sparse matrix-vector multiply on the keystone ii digital signal processor. In: 2014 IEEE High Performance Extreme Computing Conference (HPEC), pp. 1–6. IEEE (2014)

    Google Scholar 

  10. Golub, G.H., Loan, C.F.V.: Matrix Computations, 3rd edn. The Johns Hopkins University Press, Baltimore, MD (1996)

    Google Scholar 

  11. Igual, F.D., Ali, M., Friedmann, A., Stotzer, E., Wentz, T., van de Geijn, R.A.: Unleashing the high-performance and low-power of multi-core DSPs for general-purpose HPC. In: SC’12: Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, pp. 1–11. IEEE (2012)

    Google Scholar 

  12. Im, E.J., Yelick, K.: Optimization of sparse matrix kernels for data mining. In: Submitted to First SIAM Conference on Data Mining (2000)

    Google Scholar 

  13. Kincaid, D.R., Oppe, T.C., Young, D.M.: ITPACKV 2D user’s guide. Technical Report, Texas University, Austin, TX (USA). Center for Numerical Analysis (1989)

    Google Scholar 

  14. Kreutzer, M., Hager, G., Wellein, G., Fehske, H., Bishop, A.R.: A unified sparse matrix data format for efficient general sparse matrix-vector multiplication on modern processors with wide SIMD units. SIAM J. Sci. Comput. 36(5), C401–C423 (2014)

    Article  MathSciNet  Google Scholar 

  15. Kubota, Y., Takahashi, D.: Optimization of sparse matrix-vector multiplication by auto selecting storage schemes on GPU. In: Murgante, B., Gervasi, O., Iglesias, A., Taniar, D., Apduhan, B.O. (eds.) ICCSA 2011. LNCS, vol. 6783, pp. 547–561. Springer, Heidelberg (2011). https://doi.org/10.1007/978-3-642-21887-3_42

    Chapter  Google Scholar 

  16. Lewis, J.G.: Algorithm 582: The Gibbs-Poole-Stockmeyer and Gibbs-king algorithms for reordering sparse matrices. ACM Trans. Math. Softw. (TOMS) 8(2), 190–194 (1982)

    Article  Google Scholar 

  17. Li, C., Xia, T., Zhao, W., Zheng, N., Ren, P.: SpV8: Pursuing optimal vectorization and regular computation pattern in SpMV. In: 2021 58th ACM/IEEE Design Automation Conference (DAC), pp. 661–666. IEEE (2021)

    Google Scholar 

  18. Liu, S., Cao, Y., Sun, S.: Mapping and optimization method of SpMV on Multi-DSP accelerator. Electronics 11(22), 3699 (2022)

    Article  Google Scholar 

  19. Liu, W., Vinter, B.: CSR5: An efficient storage format for cross-platform sparse matrix-vector multiplication. In: 29th ACM ICS’15, pp. 339–350. ACM, New York (2015)

    Google Scholar 

  20. Liu, W., Vinter, B.: Speculative segmented sum for sparse matrix-vector multiplication on heterogeneous processors. Parallel Comput. 49, 179–193 (2015)

    Article  MathSciNet  Google Scholar 

  21. Liu, X., Smelyanskiy, M., Chow, E., Dubey, P.: Efficient sparse matrix-vector multiplication on X86-based many-core processors. In: ICS’13, pp. 273–282. ACM, New York (2013)

    Google Scholar 

  22. Liu, Y., Schmidt, B.: LightSpMV: Faster CSR-based sparse matrix-vector multiplication on CUDA-enabled GPUs. In: 2015 IEEE 26th International Conference on Application-specific Systems, Architectures and Processors (ASAP), pp. 82–89. IEEE (2015)

    Google Scholar 

  23. Liu, Z., Tian, X.: Vectorization of matrix multiplication for multi-core vector processors. Chin. J. Comput. 41(10), 2251–2264 (2018)

    MathSciNet  Google Scholar 

  24. McCalpin, J.D.: Memory bandwidth and machine balance in current high performance computers. In: IEEE Computer Society Technical Committee on Computer Architecture (TCCA) Newsletter, pp. 19–25, December 1995

    Google Scholar 

  25. Merrill, D., Garland, M.: Merge-based parallel sparse matrix-vector multiplication. In: SC’16. Salt Lake (2016)

    Google Scholar 

  26. Mironowicz, P., Dziekonski, A., Mrozowski, M.: A task-scheduling approach for efficient sparse symmetric matrix-vector multiplication on a GPU. SIAM J. Sci. Comput. 37(6), C643–C666 (2015)

    Article  MathSciNet  Google Scholar 

  27. Monakov, A., Lokhmotov, A., Avetisyan, A.: Automatically tuning sparse matrix-vector multiplication for GPU architectures. In: Patt, Y.N., Foglia, P., Duesterwald, E., Faraboschi, P., Martorell, X. (eds.) HiPEAC 2010. LNCS, vol. 5952, pp. 111–125. Springer, Heidelberg (2010). https://doi.org/10.1007/978-3-642-11515-8_10

    Chapter  Google Scholar 

  28. Namashivayam, N., Mehta, S., Yew, P.C.: Variable-sized blocks for locality-aware SpMV. In: IEEE/ACM CGO, IEEE (2021)

    Google Scholar 

  29. Niu, Y., Zhengyang, L., Dong, M., Jin, Z., Liu, W., Tan, G.: TileSpMV: a tiled algorithm for sparse matrix-vector multiplication on GPUs. In: 35th IPDPS, pp. 68–78. IEEE (2021)

    Google Scholar 

  30. Saad, Y.: Iterative methods for sparse linear systems. In: SIAM (2003)

    Google Scholar 

  31. Sun, Q., Zhang, C., Wu, C., Zhang, J., Li, L.: Bandwidth reduced parallel SpMV on the SW26010 many-core platform. In: Proceedings of the 47th International Conference on Parallel Processing, pp. 1–10 (2018)

    Google Scholar 

  32. Tiwari, A., Kumar, V., Mitra, G.: High performance and energy optimal parallel programming on CPU and DSP based MPSOC. Ph.D. thesis, Ph. D. dissertation, IIIT-Delhi (2018)

    Google Scholar 

  33. Wang, Y., et al.: Advancing DSP into HPC, AI, and beyond: challenges, mechanisms, and future directions. CCF Trans. High Perform. Comput. 3, 114–125 (2021)

    Article  Google Scholar 

  34. Williams, S., Oliker, L., Vuduc, R., Shalf, J., Yelick, K., Demmel, J.: Optimization of sparse matrix-vector multiplication on emerging multicore platforms. In: Proceedings of the 2007 ACM/IEEE Conference on Supercomputing, pp. 1–12 (2007)

    Google Scholar 

  35. Xie, B., Zhan, J., Liu, X., Gao, W., Jia, Z., He, X., Zhang, L.: CVR: efficient vectorization of SPMV on x86 processors. In: IEEE CGO (2018)

    Google Scholar 

  36. Xu, H., Zhu, X., Wang, Q., Liu, J.: Efficiently executing sparse matrix-matrix multiplication on general purpose digital single processor. In: 2022 IEEE 24th International Conferenct on High Performance Computing & Communications, pp. 1–8. IEEE (2022)

    Google Scholar 

  37. Yang, B., Gu, S., Gu, T.X., Zheng, C., Liu, X.P.: Parallel multicore CSB format and its sparse matrix vector multiplication. In: Advances in Linear Algebra & Matrix Theory, vol. 2014 (2014)

    Google Scholar 

  38. Yin, S., Wang, Q., Hao, R., Zhou, T., Mei, S., Liu, J.: Optimizing irregular-shaped matrix-matrix multiplication on multi-core DSPs. In: 2022 IEEE International Conference on Cluster Computing (CLUSTER), pp. 451–461. IEEE (2022)

    Google Scholar 

  39. Zhang, Y., et al.: Memory-aware optimization for sequences of sparse matrix-vector multiplications. In: 37th IEEE International Parallel & Distributed Processing Symposium (IPDPS), IEEE (2023)

    Google Scholar 

  40. Zhang, Y., Li, S., Yan, S., Zhou, H.: A cross-platform SpMV framework on many-core architectures. ACM Trans. Archit. Code Optim. (TACO) 13(4), 1–25 (2016)

    Article  Google Scholar 

  41. Zhou, H., Fan, X., Zhao, L.: Optimizations on sparse matrix-vector multiplication based on CUDA. Comput. Meas. Control 18(8) (2010)

    Google Scholar 

Download references

Acknowledgement

This work was supported in part by the National Key R &D Program of China under grant agreement 2021YFB0300101, the National Science Foundation of China (NSFC) under grant agreements 61902411, 62032023, 12002382, 11275269, 42104078 and 62073333, the Excellent Youth Foundation of Hunan Province under grant agreement 2021JJ10050.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Shengguo Li or Dezun Dong .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2024 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Bi, D., Li, S., Zhang, Y., Yang, X., Dong, D. (2024). Efficiently Running SpMV on Multi-core DSPs for Banded Matrix. In: Tari, Z., Li, K., Wu, H. (eds) Algorithms and Architectures for Parallel Processing. ICA3PP 2023. Lecture Notes in Computer Science, vol 14491. Springer, Singapore. https://doi.org/10.1007/978-981-97-0808-6_12

Download citation

  • DOI: https://doi.org/10.1007/978-981-97-0808-6_12

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-97-0807-9

  • Online ISBN: 978-981-97-0808-6

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics