research-article

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Authors:

Abinash Mohanty,

Sarma Vrudhula,

Yu CaoAuthors Info & Claims

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 16 - 25

https://doi.org/10.1145/2847263.2847276

Published: 21 February 2016 Publication History

Abstract

Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute-/memory-intensive, it is difficult to perform real-time classification with low power consumption on today?s computing systems. FPGAs have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAs are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.

References

[1]

Y. LeCun, et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems, 396--404, 1990.

Digital Library

[2]

O. Russakovsky, et al. ImageNet large-scale visual recognition challenge. In Int. J. Computer Vision, 2015.

Digital Library

[3]

A. Karpathy, et al. Large-scale video classification with convolutional neural networks. In CVPR, 1725--1732, 2014.

Digital Library

[4]

H. Li, Z. Lin, X. Shen, J. Brandt and G. Hua. A convolutional neural network cascade for face detection. In CVPR, 5325--5334, 2015.

[5]

P. Barros, S. Magg, C. Weber and S. Wermter. A multichannel convolutional neural network for hand posture recognition. In Int. Conf. on Artificial Neural Networks (ICANN), 403--410, 2014.

[6]

O. Abdel-Hamid, et al. Convolutional neural networks for speech recognition. In IEEE Trans. on Audio, Speech and Language Processing, 1533--1545, Oct 2014.

Digital Library

[7]

R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Int. Conf. on Machine Learning, 160--167, 2008.

Digital Library

[8]

S. Lai, L. Xu, K. Liu and J. Zhao. Recurrent convolutional neural networks for text classification. In AAAI Conf. on Artificial Intelligence, 2267--2273, 2015.

Digital Library

[9]

C. Szegedy, et al. Going deeper with convolutions. In CVPR, 1--9, 2015.

[10]

C. Farabet, et al. Hardware accelerated convolutional neural networks for synthetic vision systems. In ISCAS, 257--260, 2010.

[11]

S. Chakradhar, et al. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, 247--257, 2010.

Digital Library

[12]

M. Peemen, et al. Memory-centric accelerator design for convolutional neural networks. In ICCD, 13--19, 2013.

[13]

C. Zhang, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM Int. Symp. On Field-Programmable Gate Arrays, 161--170, 2015.

Digital Library

[14]

V. Gokhale, et al. A 240 G-ops/s mobile coprocessor for deep neural networks. In CVPR Workshops, 696--701, 2014.

Digital Library

[15]

Y. Chen, et al. DaDianNao: A machine-learning supercomputer. In IEEE/ACM Int. Symp. on Microarchitecture, 602--622, 2014.

Digital Library

[16]

A. Krizhevsky, et al. ImageNet classification with deep convolutional neural networks. In NIPS, 1097--1105, 2012.

[17]

K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

[18]

Y.L. Boureau, et al. A Theoretical Analysis of Feature Pooling in Visual Recognition. In Int. Conf. on Machine Learning, 2010.

[19]

M. Denil, et al. Predicting parameters in deep learning. In NIPS, 2148--2156, 2013.

[20]

Y. Jia, et al. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.

[21]

Khronos OpenCL Working Group. The OpenCL Specification, version 1.1.44, 2011.

[22]

M. S. Abdelfattah, et al. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Int. Workshop on OpenCL 2014.

Digital Library

[23]

K. Chellapilla, S. Puri and P. Simard. High performance convolutional neural networks for document processing. In Int. Workshop on Frontiers in Handwriting Recognition, 2006.

[24]

Altera OpenCL design examples. Available online at https://www.altera.com/support/support-resources/design-examples/design-software/opencl.html

[25]

Nallatech P395-D8 OpenCL FPGA accelerator cards. http://www.nallatech.com/wp-content/uploads/openclcardspb_v1_51.pdf

[26]

DE5-Net FPGA kit user manual. Available online at ftp://ftp.altera.com/up/pub/Altera_Material/Boards/DE5/DE5_User_Manual.pdf

[27]

R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Proc. SuperComputing 1998: High Performance Networking and Computing, 2001.

Digital Library

Cited By

Mohd BAhmad Yousef KAlMajali AHayajneh T(2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
https://doi.org/10.3390/electronics13091727
Liu HQian YLiang YZhang BLiu ZHe TZhao WLu JYu B(2024)A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAsACM Transactions on Design Automation of Electronic Systems10.1145/365285529:3(1-25)Online publication date: 16-Mar-2024
https://dl.acm.org/doi/10.1145/3652855
Miliadis PTheodoropoulos DPnevmatikatos DKoziris N(2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3648475
Show More Cited By

Index Terms

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators

Recommendations

Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network
FPGA '17: Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

OpenCL FPGA has recently gained great popularity with emerging needs for workload acceleration such as Convolutional Neural Network (CNN), which is the most popular deep learning architecture in the domain of computer vision. While OpenCL enhances the ...
Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks

Deep convolutional neural networks (CNNs) have gained great success in various computer vision applications. State-of-the-art CNN models for large-scale applications are computation intensive and memory expensive and, hence, are mainly processed on high-...
Nuclear Reactor Simulations on OpenCL FPGA Platform
FPGA '19: Proceedings of the 2019 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Field-programmable gate arrays (FPGAs) are becoming a promising choice as a heterogeneous computing component for scientific computing when floating-point optimized architectures are added to the current FPGAs. The maturing high-level synthesis (HLS) ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2016

298 pages

ISBN:9781450338561

DOI:10.1145/2847263

General Chair:
Deming Chen
University of Illinois at Urbana-Champaign, USA
,
Program Chair:
Jonathan Greene
Microsemi, USA

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 21 February 2016

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA'16

Sponsor:

SIGDA

FPGA'16: The 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 21 - 23, 2016

California, Monterey, USA

Acceptance Rates

FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

397
Total Citations
View Citations
6,589
Total Downloads

Downloads (Last 12 months)304
Downloads (Last 6 weeks)20

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mohd BAhmad Yousef KAlMajali AHayajneh T(2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
https://doi.org/10.3390/electronics13091727
Liu HQian YLiang YZhang BLiu ZHe TZhao WLu JYu B(2024)A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAsACM Transactions on Design Automation of Electronic Systems10.1145/365285529:3(1-25)Online publication date: 16-Mar-2024
https://dl.acm.org/doi/10.1145/3652855
Miliadis PTheodoropoulos DPnevmatikatos DKoziris N(2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3648475
Ma D(2024)Advancing machine learning tasks with field-programmable gate arrays: advantages, applications, challenges, and future perspectivesSecond International Conference on Electrical, Electronics, and Information Engineering (EEIE 2023)10.1117/12.3017278(26)Online publication date: 15-Jan-2024
https://doi.org/10.1117/12.3017278
Wang ZLuo TGoh RZhou J(2024)EDCompress: Energy-Aware Model Compression for DataflowsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317294135:1(208-220)Online publication date: Jan-2024
https://doi.org/10.1109/TNNLS.2022.3172941
Wang YChang XZhu HWang JGong YLi L(2024)Towards Secure Runtime Customizable Trusted Execution Environment on FPGA-SoCIEEE Transactions on Computers10.1109/TC.2024.335577273:4(1138-1151)Online publication date: Apr-2024
https://doi.org/10.1109/TC.2024.3355772
Krishna ARohit Nudurupati SChandana DDwivedi Pvan Schaik AMehendale MThakur C(2024)RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on EdgeIEEE Internet of Things Journal10.1109/JIOT.2024.338683211:14(24831-24845)Online publication date: 15-Jul-2024
https://doi.org/10.1109/JIOT.2024.3386832
Katti PNimbekar ALi CAcharyya AAl-Hashimi BRajendran B(2024)Bayesian Inference Accelerator for Spiking Neural Networks2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558608(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10558608
Yang TLi CZhang YLuo KDong ZLi J(2024)A Convolutional Neural Network Accelerator with High-level Synthesis2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI61221.2024.10594247(38-43)Online publication date: 24-May-2024
https://doi.org/10.1109/ICETCI61221.2024.10594247
Huang LFang CLi QLin JWang ZKim T(2024)A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme EdgeProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473817(927-932)Online publication date: 22-Jan-2024
https://dl.acm.org/doi/10.1109/ASP-DAC58780.2024.10473817
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents