research-article

A power-efficient and high performance FPGA accelerator for convolutional neural networks: work-in-progress

Authors:

Lei Gong,

Chao Wang,

Xi Li,

Huaping Chen,

Xuehai ZhouAuthors Info & Claims

CODES '17: Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion

Article No.: 16, Pages 1 - 2

https://doi.org/10.1145/3125502.3125534

Published: 15 October 2017 Publication History

Get Access

Abstract

Recently, FPGAs have been widely used in the implementation of hardware accelerators for Convolutional Neural Networks (CNN), especially on mobile and embedded devices. However, most of these existing accelerators are designed with the same concept as their ASIC counterparts, that is all operations from different CNN layers are mapped to the same hardware units and work in a multiplexed way. Although this approach improves the generality of these accelerators, it does not take full advantage of reconfigurability and customizability of FPGAs, resulting in a certain degree of computational efficiency degradation, which is even worse on the embedded platforms. In this paper, we propose an FPGA-based CNN accelerator with all the layers mapped to their own on-chip units, and working concurrently as a pipeline. A strategy which can find the optimized paralleling scheme for each layer is proposed to eliminate the pipeline stall and achieve high resource utilization. In addition, a balanced pruning-based method is applied on fully connected (FC) layers to reduce the computational redundancy. As a case study, we implement a widely used CNNs model, LeNet-5, on an embedded FPGA device, Xilinx Zedboard. It can achieve a peak performance of 39.78 GOP/s and the power efficiency with a value 19.6 GOP/s/W which outperforms previous approaches.

References

[1]

Jiachen Mao, Xiang Chen, Kent W Nixon, Christopher Krieger, and Yiran Chen. Modnn: Local distributed mobile computing system for deep neural network. In DATE, pages 1396--1401, 2017.

Google Scholar

[2]

Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, and Huazhong Yang. Going deeper with embedded fpga platform for convolutional neural network. In FPGA, pages 26--35, 2016.

Digital Library

Google Scholar

[3]

Chao Wang, Lei Gong, Qi Yu, Xi Li, Yuan Xie, and Xuehai Zhou. Dlau: A scalable deep learning accelerator unit on fpga. TCAD, 36(3):513--517, 2017.

Google Scholar

[4]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. Caffeine: towards uniformed representation and acceleration for deep convolutional neural networks. In ICCAD, page 12, 2016.

Digital Library

Google Scholar

[5]

Chen Zhang, Peng Li, Guangyu Sun, Yijin Guan, Bingjun Xiao, and Jason Cong. Optimizing fpga-based accelerator design for deep convolutional neural networks. In FPGA, pages 161--170, 2015.

Digital Library

Google Scholar

Cited By

View all

Huang YFan GMai JJiang WHu JYao E(2024)A Post-Quantum Encryption Mechanism Based on Convolutional Neural Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337746071:8(3945-3949)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3377460
Huang YMai JJiang WYao E(2024)A Trusted Inference Mechanism for Edge Computing Based on Post-Quantum Encryption2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557963(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10557963
Liu YShi QWang HLiu X(2023)A Convolutional Computing Design Using Pulsating Arrays2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)10.1109/ICNC-FSKD59587.2023.10281046(1-5)Online publication date: 29-Jul-2023
https://doi.org/10.1109/ICNC-FSKD59587.2023.10281046
Show More Cited By

Recommendations

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-...
A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks
ICMSSP '18: Proceedings of the 3rd International Conference on Multimedia Systems and Signal Processing

In this paper, we propose a new high-performance accelerator that supports a variety of convolutional neural networks (CNNs) such as GoogLeNet, ResNet and AlexNet. The proposed accelerator mainly includes 24 parallel PEs (processing engines) for ...
An FPGA-based accelerator platform implements for convolutional neural network
HP3C '19: Proceedings of the 3rd International Conference on High Performance Compilation, Computing and Communications

In recent years, convolutional neural network (CNN) has become widely universal in large number of applications including computer vision, natural language processing and automatic driving. However, the CNN-based methods are computational-intensive and ...

Comments

Information & Contributors

Information

Published In

CODES '17: Proceedings of the Twelfth IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis Companion

October 2017

84 pages

ISBN:9781450351850

DOI:10.1145/3125502

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 October 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ESWEEK'17

ESWEEK'17: THIRTEENTH EMBEDDED SYSTEM WEEK

October 15 - 20, 2017

Seoul, Republic of Korea

Acceptance Rates

Overall Acceptance Rate 280 of 864 submissions, 32%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

16
Total Citations
View Citations
345
Total Downloads

Downloads (Last 12 months)12
Downloads (Last 6 weeks)2

Reflects downloads up to 11 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Huang YFan GMai JJiang WHu JYao E(2024)A Post-Quantum Encryption Mechanism Based on Convolutional Neural Network AcceleratorIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337746071:8(3945-3949)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3377460
Huang YMai JJiang WYao E(2024)A Trusted Inference Mechanism for Edge Computing Based on Post-Quantum Encryption2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10557963(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10557963
Liu YShi QWang HLiu X(2023)A Convolutional Computing Design Using Pulsating Arrays2023 19th International Conference on Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD)10.1109/ICNC-FSKD59587.2023.10281046(1-5)Online publication date: 29-Jul-2023
https://doi.org/10.1109/ICNC-FSKD59587.2023.10281046
Wang ZZhao MGong LWang C(2022)WGeod: A General and Efficient FPGA Accelerator for Object Detection2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom)10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00099(730-738)Online publication date: Dec-2022
https://doi.org/10.1109/ISPA-BDCloud-SocialCom-SustainCom57177.2022.00099
Lv ZZhang J(2022)A Survey of FPGA-Based Deep Learning Acceleration ResearchThe International Conference on Image, Vision and Intelligent Systems (ICIVIS 2021)10.1007/978-981-16-6963-7_5(59-65)Online publication date: 3-Mar-2022
https://doi.org/10.1007/978-981-16-6963-7_5
Arias-Garcia JMafra AGade LCoelho FCastro CTorres LBraga A(2021)Enhancing Performance of Gabriel Graph-Based Classifiers by a Hardware Co-Processor for Embedded System ApplicationsIEEE Transactions on Industrial Informatics10.1109/TII.2020.298732917:2(1186-1196)Online publication date: Feb-2021
https://doi.org/10.1109/TII.2020.2987329
Zhang JCai HLi J(2021)A High Energy Efficiency and Low Resource Consumption FPGA Accelerator for Convolutional Neural Network2021 7th International Conference on Computer and Communications (ICCC)10.1109/ICCC54389.2021.9674340(1278-1283)Online publication date: 10-Dec-2021
https://doi.org/10.1109/ICCC54389.2021.9674340
Amrutha RLakshmi K(2021)Realization of convolution layer using system verilog for achieving parallelism and improvement in performance parametersInternational Journal of Information Technology10.1007/s41870-021-00724-9Online publication date: 19-Jun-2021
https://doi.org/10.1007/s41870-021-00724-9
Gonzalez HMuzaffar SYoo JElfadel I(2020)An Inference Hardware Accelerator for EEG-Based Emotion Detection2020 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS45731.2020.9180728(1-5)Online publication date: Oct-2020
https://doi.org/10.1109/ISCAS45731.2020.9180728
Gonzalez HMuzaffar SYoo JElfadel I(2020)BioCNN: A Hardware Inference Engine for EEG-based Emotion DetectionIEEE Access10.1109/ACCESS.2020.3012900(1-1)Online publication date: 2020
https://doi.org/10.1109/ACCESS.2020.3012900
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Recommendations

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks

An FPGA-based accelerator platform implements for convolutional neural network

Comments

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Recommendations

Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

A High-Performance Reconfigurable Accelerator for Convolutional Neural Networks

An FPGA-based accelerator platform implements for convolutional neural network

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations