Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3381427.3381428acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesparma-ditamConference Proceedingsconference-collections
research-article

Sparse Matrix-Dense Matrix Multiplication on Heterogeneous CPU+FPGA Embedded System

Published: 16 March 2020 Publication History

Abstract

Embedded intelligence is becoming the primary driver for new applications in industry, healthcare, and automotive, to name a few. The main characteristics of these applications are high computational demand, real-time interaction with the environment, security, low power consumption, and local autonomy, among others. Addressing these diverse characteristics, researchers have proposed heterogeneous multicore embedded systems comprising CPUs, GPUs, FPGAs, and ASICs. Whereas each computing element provides a unique capability to enable one of the application characteristics, collaborating these processing cores in running an application to get the maximum performance is a crucial challenge. This paper considers the collaborative usage of a multicore CPU and an FPGA in a heterogeneous embedded system to improve the performance of sparse matrix operations, which have been essential techniques in reducing the inference complexity in machine learning techniques, especially deep convolutional neural networks. Experimental results show that the collaborative execution of sparse-matrix-dense-matrix multiplication on the Xilinx Zynq MPSoC, a heterogeneous CPU+FPGA embedded system, can improve the performance by a factor of up to 42% compared with just using the FPGA as an accelerator.

References

[1]
Nathan Bell and Michael Garland. 2009. Implementing Sparse Matrix-vector Multiplication on Throughput-oriented Processors. In Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis (SC '09). ACM, New York, NY, USA, 18:1-- -18:11. https://doi.org/10.1145/1654059.1654078
[2]
Jee W Choi, Amik Singh, and Richard W Vuduc. 2010. Model-driven Autotuning of Sparse Matrix-vector Multiply on GPUs. SIGPLAN Not. 45, 5 (jan 2010), 115--126. https://doi.org/10.1145/1837853.1693471
[3]
Tim Davis. 2019. SuiteSparse Matrix Collection. https://sparse.tamu.edu/
[4]
Richard Dorrance, Fengbo Ren, and Dejan Marković. 2014. A Scalable Sparse Matrix-vector Multiplication Kernel for Energy-efficient Sparse-blas on FPGAs. In Proceedings of the 2014 ACM/SIGDA International Symposium on Field-programmable Gate Arrays (FPGA '14). ACM, New York, NY, USA, 161--170. R@https://doi.org/10.1145/2554688.2554785
[5]
A Elafrou, G Goumas, and N Koziris. 2017. Performance Analysis and Optimization of Sparse Matrix-Vector Multiplication on Intel Xeon Phi. In 2017 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW). 1389--1398. https://doi.org/10.1109/IPDPSW.2017.134
[6]
Michael Garland. 2008. Sparse Matrix Computations on Manycore GPU's. In Proceedings of the 45th Annual Design Automation Conference (DAC '08). ACM, New York, NY, USA, 2--6. https://doi.org/10.1145/1391469.1391473
[7]
Mohammad Hosseinabady. 2020. Sparse Matrix-Dense Matrix Multiplication (SpMDM) implementation. https://github.com/Hosseinabady/SDSoC-Benchmarks/tree/master/SpMDM
[8]
Mohammad Hosseinabady and Jose Nunez-Yanez. 2019. A Streaming Dataflow Engine for Sparse Matrix-Vector Multiplication using High-Level Synthesis. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2019). https://doi.org/10.1109/TCAD.2019.2912923
[9]
Forrest N Iandola, Matthew W Moskewicz, Khalid Ashraf, Song Han, William J Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size. CoRR abs/1602.0 (2016). arXiv:1602.07360 http://arxiv.org/abs/1602.07360
[10]
Sid Samsi Jeremy Kepner, Simon Alford, Vijay Gadepally, Michael Jones, Lauren Milechin, Ryan Robinett. 2019. Sparse deep neural network graph challenge. 7 pages. https://graphchallenge.mit.edu/
[11]
M Krotkiewski and M Dabrowski. 2010. Parallel Symmetric Sparse Matrix-vector Product on Scalar Multi-core CPUs. Parallel Comput. 36, 4 (apr 2010), 181--198. https://doi.org/10.1016/j.parco.2010.02.003
[12]
L Lu, J Xie, R Huang, J Zhang, W Lin, and Y Liang. 2019. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM). 17--25. https://doi.org/10.1109/FCCM.2019.00013
[13]
Jing Nie, Chunlei Zhang, Dan Zou, Fei Xia, Lina Lu, Xiang Wang, and Fei Zhao. 2019. Adaptive Sparse Matrix-Vector Multiplication on CPU-GPU Heterogeneous Architecture. In Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference (HPCCT 2019). ACM, New York, NY, USA, 6--10. https://doi.org/10.1145/3341069.3341072
[14]
Leonid Yavits and Ran Ginosar. 2018. Accelerator for Sparse Machine Learning. IEEE Comput. Archit. Lett. 17, 1 (jan 2018), 21--24. https://doi.org/10.1109/LCA.2017.2714667
[15]
J Yu, A Lukefahr, D Palframan, G Dasika, R Das, and S Mahlke. 2017. Scalpel: Customizing DNN pruning to the underlying hardware parallelism. In 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA). 548--560. https://doi.org/10.1145/3079856.3080215

Cited By

View all
  • (2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PARMA-DITAM'2020: Proceedings of the 11th Workshop on Parallel Programming and Run-Time Management Techniques for Many-core Architectures / 9th Workshop on Design Tools and Architectures for Multicore Embedded Computing Platforms
January 2020
30 pages
ISBN:9781450375450
DOI:10.1145/3381427
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

In-Cooperation

  • HiPEAC: HiPEAC Network of Excellence

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Embedded FPGA
  2. Heterogeneous System
  3. High-level Synthesis
  4. Sparse Matrix

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

PARMA-DITAM'2020

Acceptance Rates

PARMA-DITAM'2020 Paper Acceptance Rate 5 of 9 submissions, 56%;
Overall Acceptance Rate 11 of 24 submissions, 46%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)36
  • Downloads (Last 6 weeks)3
Reflects downloads up to 28 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Real-time Blood Pressure Prediction on Wearables with Edge-Based DNNs: A Co-Design ApproachACM Transactions on Design Automation of Electronic Systems10.1145/369951230:1(1-24)Online publication date: 7-Oct-2024
  • (2023)A Survey of Accelerating Parallel Sparse Linear AlgebraACM Computing Surveys10.1145/360460656:1(1-38)Online publication date: 28-Aug-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media