Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3289602.3293988acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
poster
Public Access

XFER: A Novel Design to Achieve Super-Linear Performance on Multiple FPGAs for Real-Time AI

Published: 20 February 2019 Publication History

Abstract

Real-time inference with low latency requirement has become increasingly important for numerous applications in both cloud computing and edge computing. The FPGA-based Deep Neural Network (DNN) accelerators have demonstrated the superior performance and energy efficiency over CPUs and GPUs; in addition, for real-time AI with low batch size, FPGA is expected to achieve further performance improvement over the general purpose computing platform. However, the performance gain of the single-FPGA design is hindered by the limited on-chip resource. In this paper, we leverage a cluster of FPGAs to fully exploit the parallelism in DNNs with the objective of obtaining super-linear performance. To achieve this goal, a novel design, "XFER", is proposed to deploy DNNs to FPGA cluster by splitting the DNN layer to multiple FPGAs and moving traffics from memory bus to inter-FPGA links. The resultant system can achieve both workload balance and traffic balance. As a case study, we implement Convolutional Neural Networks (CNNs) on ZCU102 FPGA boards. Evaluation results demonstrate that XFER on two FPGAs can achieve 3.48x speedup compared with state-of-the-art FPGA designs, achieving super-linear speedup.

References

[1]
E. Chung, J. Fowers, K. Ovtcharov, M. Papamichael, A. Caulfield, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, et al. Serving dnns in real time at datacenter scale with project brainwave. IEEE Micro, 38(2):8--20, 2018.
[2]
Y. Ding, J. Liu, and Y. Shi. On the universal approximability of quantized relu neural networks. arXiv preprint arXiv:1802.03646, 2018.
[3]
J. Fowers, K. Ovtcharov, M. Papamichael, T. Massengill, M. Liu, D. Lo, S. Alkalay, M. Haselman, L. Adams, M. Ghandi, et al. A configurable cloud-scale dnn processor for real-time ai. In Proceedings of the 45th Annual International Symposium on Computer Architecture, pages 1--14. IEEE Press, 2018.
[4]
W. Jiang, E. H.-M. Sha, Q. Zhuge, L. Yang, X. Chen, and J. Hu. Heterogeneous fpga-based cost-optimal design for timing-constrained cnns. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 37(11):2542--2554, 2018.
[5]
Y. Shen, M. Ferdman, and P. Milder. Maximizing cnn accelerator efficiency through resource partitioning. In Computer Architecture (ISCA), 2017 ACM/IEEE 44th Annual International Symposium on, pages 535--547. IEEE, 2017.
[6]
X. Xu, Y. Ding, S. X. Hu, M. Niemier, J. Cong, Y. Hu, and Y. Shi. Scaling for edge inference of deep neural networks. Nature Electronics, 1(4):216, 2018.
[7]
X. Xu, Q. Lu, T. Wang, J. Liu, C. Zhuo, X. S. Hu, and Y. Shi. Edge segmentation: Empowering mobile telemedicine with compressed cellular neural networks. In Proceedings of the 36th International Conference on Computer-Aided Design, pages 880--887. IEEE Press, 2017.
[8]
L. Yang, W. Liu, W. Jiang, M. Li, P. Chen, and E. H.-M. Sha. Fotonoc: A folded torus-like network-on-chip based many-core systems-on-chip in the dark silicon era. IEEE Transactions on Parallel and Distributed Systems, 28(7):1905--1918, 2017.
[9]
C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays, pages 161--170. ACM, 2015.
[10]
C. Zhang, D. Wu, J. Sun, G. Sun, G. Luo, and J. Cong. Energy-efficient cnn implementation on a deeply pipelined fpga cluster. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design, pages 326--331. ACM, 2016.

Cited By

View all
  • (2022)Hardware-friendly compression and hardware acceleration for transformer: A surveyElectronic Research Archive10.3934/era.202219230:10(3755-3785)Online publication date: 2022
  • (2021)Co-Exploration of Graph Neural Network and Network-on-Chip Design Using AutoMLProceedings of the 2021 Great Lakes Symposium on VLSI10.1145/3453688.3461741(175-180)Online publication date: 22-Jun-2021
  • (2021)Accommodating Transformer onto FPGAProceedings of the 2021 Great Lakes Symposium on VLSI10.1145/3453688.3461739(163-168)Online publication date: 22-Jun-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2019
360 pages
ISBN:9781450361378
DOI:10.1145/3289602
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 February 2019

Check for updates

Author Tags

  1. multi-fpga cluster
  2. real-time inference
  3. super-linear performance

Qualifiers

  • Poster

Funding Sources

Conference

FPGA '19
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 26 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2022)Hardware-friendly compression and hardware acceleration for transformer: A surveyElectronic Research Archive10.3934/era.202219230:10(3755-3785)Online publication date: 2022
  • (2021)Co-Exploration of Graph Neural Network and Network-on-Chip Design Using AutoMLProceedings of the 2021 Great Lakes Symposium on VLSI10.1145/3453688.3461741(175-180)Online publication date: 22-Jun-2021
  • (2021)Accommodating Transformer onto FPGAProceedings of the 2021 Great Lakes Symposium on VLSI10.1145/3453688.3461739(163-168)Online publication date: 22-Jun-2021
  • (2021)Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization2021 IEEE/ACM International Conference On Computer Aided Design (ICCAD)10.1109/ICCAD51958.2021.9643586(1-9)Online publication date: 1-Nov-2021
  • (2021)Energy-Efficient Deep Neural Networks Implementation on a Scalable Heterogeneous FPGA Cluster2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID)10.1109/ASID52932.2021.9651719(10-15)Online publication date: 29-Oct-2021
  • (2020)ConfuciuX: Autonomous Hardware Resource Assignment for DNN Accelerators using Reinforcement Learning2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00058(622-636)Online publication date: Oct-2020
  • (2019)When Neural Architecture Search Meets Hardware Implementation: from Hardware Awareness to Co-Design2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI)10.1109/ISVLSI.2019.00014(25-30)Online publication date: Jul-2019
  • (2019)A Module-Level Pipeline Implementation Based on Inter-Board Heterogeneous2019 IEEE 4th International Conference on Integrated Circuits and Microsystems (ICICM)10.1109/ICICM48536.2019.8977153(280-286)Online publication date: Oct-2019
  • (2019)Machine Vision Guided 3D Medical Image Compression for Efficient Transmission and Accurate Segmentation in the Clouds2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)10.1109/CVPR.2019.01297(12679-12688)Online publication date: Jun-2019
  • (undefined)Co-Exploring Neural Architecture and Network-on-Chip Design for Real-Time Artificial Intelligence2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC47756.2020.9045595(85-90)

View Options

View options

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media