Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/PACT52795.2021.00016acmconferencesArticle/Chapter ViewAbstractPublication PagespactConference Proceedingsconference-collections
research-article

InnerSP: A Memory Efficient Sparse Matrix Multiplication Accelerator with Locality-aware Inner Product Processing

Published: 26 November 2024 Publication History

Abstract

Sparse matrix multiplication is one of the key computational kernels in large-scale data analytics. However, a naive implementation suffers from the overheads of irregular memory accesses due to the representation of sparsity. To mitigate the memory access overheads, recent accelerator designs advocated the outer product processing which minimizes input accesses but generates intermediate products to be merged to the final output matrix. Using real-world sparse matrices, this study first identifies the memory bloating problem of the outer product designs due to the unpredictable intermediate products. Such an unpredictable increase in memory requirement during computation can limit the applicability of accelerators. To address the memory bloating problem, this study revisits an alternative inner product approach, and proposes a new accelerator design called InnerSP. This study shows that nonzero element distributions in real-world sparse matrices have a certain level of locality. Using a smart caching scheme designed for inner product, the locality is effectively exploited with a modest on-chip cache. However, the row-wise inner product relies on on-chip aggregation of intermediate products. Due to uneven sparsity per row, overflows or underflows of the on-chip storage for aggregation can occur. To maximize the parallelism while avoiding costly overflows, the proposed accelerator uses pre-scanning for row splitting and merging. The simulation results show that the performance of InnerSP can exceed or be similar to those of the prior outer product approaches without any memory bloating problem.

References

[1]
V. Balaji, N. Crago, A.Jaleel, and B. Lucia, "P-OPT: Practical Optimal Cache Replacement for Graph Analytics", in Proceedings of the 27th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2021.
[2]
N. Bell and M. Garland, "CUSP: Generic Parallel Algorithms for Sparse Matrix and Graph Computations", https://cusplibrary.github.io/, [Online; accessed 25-November-2020].
[3]
D. Buono, F. Petrini, F. Checconi, X. Liu, X. Que, C. Long, and T.-C. Tuan, "Optimizing Sparse Matrix-Vector Multiplication for Large-Scale Data Analytics", in Proceedings of the 30th ACM International Conference on Supercomputing (ICS), 2016.
[4]
A. Coady, "Process and System for Sparse Vector and Matrix Representation of Document Indexing and Retrieval", 2004, US Patent 6,751,628.
[5]
T. A. Davis and Y. Hu, "The University of Florida Sparse Matrix Collection", ACM Transactions on Mathematical Software (TOMS), vol. 38, no. 1, pp. 1--25, 2011.
[6]
T. A. Davis and E. P. Natarajan, "Sparse Matrix Methods for Circuit Simulation Problems", in Proceedings of the 8th Scientific Computing in Electrical Engineering (SCEE), 2010.
[7]
S. Galal and M. Horowitz, "Energy-Efficient Floating-Point Unit Design", in IEEE Transactions on Computers, vol. 60, no. 7, 2010, pp. 913--922.
[8]
N. Gould, Y. Hu, and J. Scott, "GHS indef collection", ftp://ftp.numerical.rl.ac.uk/pub/matrices/symmetric/, [Online; accessed 20-August-2021].
[9]
K. Hegde, J. Yu, R. Agrawal, M. Yan, M. Pellauer, and C. Fletcher, "UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition", in Proceedings of the 45th ACM/IEEE International Symposium on Computer Architecture (ISCA), 2018.
[10]
Intel, "Intel Math Kernel Library", https://software.intel.com/content/www/us/en/develop/tools/math-kernel-library.html.
[11]
JEDEC, "High Bandwidth Memory (HBM) DRAM JESD235D", 2021.
[12]
J. Leskovec and A. Krevl, "SNAP Datasets: Stanford Large Network Dataset Collection", http://snap.stanford.edu/data, jun 2014.
[13]
S. Li, Z. Yang, D. Reddy, A. Srivastava, and B. Jacob, "DRAMsim3: A Cycle-Accurate, Thermal-Capable DRAM Simulator", IEEE Computer Architecture Letters, vol. 19, no. 2, pp. 110--113, 2020.
[14]
T.-K. Lin and S.-Y. Chien, "Support Vector Machines on GPU with Sparse Matrix Format", in Proceedings of the 9th IEEE International Conference on Machine Learning and Applications (ICMLA), 2010.
[15]
W. Liu and B. Vinter, "An Efficient GPU General Sparse Matrix-Matrix Multiplication for Irregular Data", in Proceedings of the 28th IEEE International Parallel and Distributed Processing Symposium (IPDPS), 2014.
[16]
W. Liu and B. Vinter, "A Framework for General Sparse Matrix-Matrix Multiplication on GPUs and Heterogeneous Processors", Journal of Parallel and Distributed Computing, vol. 85, pp. 47--61, 2015.
[17]
X. Luo, M. Zhou, S. Li, Z. You, Y. Xia, and Q. Zhu, "A Nonnegative Latent Factor Model for Large-Scale Sparse Matrices in Recommender Systems via Alternating Direction Method", IEEE Transactions on Neural Networks and Learning Systems, vol. 27, no. 3, pp. 579--592, 2015.
[18]
B. Muite, "Ill-conditioned Chebyshev integration matrices", [Online; accessed 20-August-2021].
[19]
Y. Nagasaka, A. Nukada, and S. Matsuoka, "High-Performance and Memory-Saving Sparse General Matrix-Matrix Multiplication for NVIDIA Pascal GPU", in Proceedings of the 46th International Conference on Parallel Processing (ICPP), 2017.
[20]
M. Naumov, L. S. Chien, P. Vandermersch, and U. Kapasi, "CUSPARSE Library", https://developer.nvidia.com/cusparse.
[21]
S. Pal, J. Beaumont, D.-H. Park, A. Amarnath, S. Feng, C. Chakrabarti, H.-S. Kim, D. Blaauw, T. Mudge, and R. Dreslinski, "OuterSPACE: An Outer Product Based Sparse Matrix Multiplication Accelerator", in Proceedings of the 24th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2018.
[22]
E. Qin, A. Samajdar, H. Kwon, V. Nadella, S. Srinivasan, D. Das, B. Kaul, and T. Krishna, "SIGMA: A Sparse and Irregular GEMM Accelerator with Flexible Interconnects for DNN Training", in Proceedings of the 26th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020.
[23]
F. Sadi, J. Sweeney, T. M. Low, J. C. Hoe, L. Pileggi, and F. Franchetti, "Efficient SpMV Operation for Large and Highly Sparse Matrices using Scalable Multi-way Merge Parallelization", in Proceedings of the 52nd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2019.
[24]
N. Srivastava, H. Jin, J. Liu, D. Albonesi, and Z. Zhang, "MatRaptor: A Sparse-Sparse Matrix Multiplication Accelerator Based on Row-Wise Product", in Proceedings of the 53rd IEEE/ACM International Symposium on Microarchitecture (MICRO), 2020.
[25]
J. Theiler, G. Cao, L. R. Bachega, and C. A. Bouman, "Sparse Matrix Transform for Hyperspectral Image Processing", IEEE Journal of Selected Topics in Signal Processing, vol. 5, no. 3, pp. 424--437, 2011.
[26]
X. Xie, Z. Liang, P. Gu, A. Basak, L. Deng, L. Liang, X. Hu, and Y. Xie, "SpaceA: Sparse Matrix Vector Multiplication on Processing-in-Memory Accelerator", in Proceedings of the 27th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2021.
[27]
Z. Zhang, H. Wang, S. Han, and W. J. Dally, "SpArch: Efficient Architecture for Sparse Matrix Multiplication", in Proceedings of the 26th IEEE International Symposium on High Performance Computer Architecture (HPCA), 2020.
[28]
Q. Zhu, T. Graf, H. E. Sumbul, L. Pileggi, and F. Franchetti, "Accelerating Sparse Matrix-Matrix Multiplication with 3D-Stacked Logic-in-Memory Hardware", in Proceedings of the IEEE High Performance Extreme Computing Conference (HPEC), 2013.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
PACT '21: Proceedings of the 30th International Conference on Parallel Architectures and Compilation Techniques
September 2021
370 pages
ISBN:9781665442787
  • Editor:
  • Jaejin Lee

Sponsors

Publisher

IEEE Press

Publication History

Published: 26 November 2024

Check for updates

Author Tags

  1. hardware accelerator
  2. inner product
  3. sparse matrix multiplication

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

PACT '21
Sponsor:

Acceptance Rates

Overall Acceptance Rate 121 of 471 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 6
    Total Downloads
  • Downloads (Last 12 months)6
  • Downloads (Last 6 weeks)6
Reflects downloads up to 31 Dec 2024

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media