Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2847263.2847276acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

Published: 21 February 2016 Publication History
  • Get Citation Alerts
  • Abstract

    Convolutional Neural Networks (CNNs) have gained popularity in many computer vision applications such as image classification, face detection, and video analysis, because of their ability to train and classify with high accuracy. Due to multiple convolution and fully-connected layers that are compute-/memory-intensive, it is difficult to perform real-time classification with low power consumption on today?s computing systems. FPGAs have been widely explored as hardware accelerators for CNNs because of their reconfigurability and energy efficiency, as well as fast turn-around-time, especially with high-level synthesis methodologies. Previous FPGA-based CNN accelerators, however, typically implemented generic accelerators agnostic to the CNN configuration, where the reconfigurable capabilities of FPGAs are not fully leveraged to maximize the overall system throughput. In this work, we present a systematic design space exploration methodology to maximize the throughput of an OpenCL-based FPGA accelerator for a given CNN model, considering the FPGA resource constraints such as on-chip memory, registers, computational resources and external memory bandwidth. The proposed methodology is demonstrated by optimizing two representative large-scale CNNs, AlexNet and VGG, on two Altera Stratix-V FPGA platforms, DE5-Net and P395-D8 boards, which have different hardware resources. We achieve a peak performance of 136.5 GOPS for convolution operation, and 117.8 GOPS for the entire VGG network that performs ImageNet classification on P395-D8 board.

    References

    [1]
    Y. LeCun, et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems, 396--404, 1990.
    [2]
    O. Russakovsky, et al. ImageNet large-scale visual recognition challenge. In Int. J. Computer Vision, 2015.
    [3]
    A. Karpathy, et al. Large-scale video classification with convolutional neural networks. In CVPR, 1725--1732, 2014.
    [4]
    H. Li, Z. Lin, X. Shen, J. Brandt and G. Hua. A convolutional neural network cascade for face detection. In CVPR, 5325--5334, 2015.
    [5]
    P. Barros, S. Magg, C. Weber and S. Wermter. A multichannel convolutional neural network for hand posture recognition. In Int. Conf. on Artificial Neural Networks (ICANN), 403--410, 2014.
    [6]
    O. Abdel-Hamid, et al. Convolutional neural networks for speech recognition. In IEEE Trans. on Audio, Speech and Language Processing, 1533--1545, Oct 2014.
    [7]
    R. Collobert and J. Weston. A unified architecture for natural language processing: deep neural networks with multitask learning. In Int. Conf. on Machine Learning, 160--167, 2008.
    [8]
    S. Lai, L. Xu, K. Liu and J. Zhao. Recurrent convolutional neural networks for text classification. In AAAI Conf. on Artificial Intelligence, 2267--2273, 2015.
    [9]
    C. Szegedy, et al. Going deeper with convolutions. In CVPR, 1--9, 2015.
    [10]
    C. Farabet, et al. Hardware accelerated convolutional neural networks for synthetic vision systems. In ISCAS, 257--260, 2010.
    [11]
    S. Chakradhar, et al. A dynamically configurable coprocessor for convolutional neural networks. In ISCA, 247--257, 2010.
    [12]
    M. Peemen, et al. Memory-centric accelerator design for convolutional neural networks. In ICCD, 13--19, 2013.
    [13]
    C. Zhang, et al. Optimizing FPGA-based accelerator design for deep convolutional neural networks. In ACM Int. Symp. On Field-Programmable Gate Arrays, 161--170, 2015.
    [14]
    V. Gokhale, et al. A 240 G-ops/s mobile coprocessor for deep neural networks. In CVPR Workshops, 696--701, 2014.
    [15]
    Y. Chen, et al. DaDianNao: A machine-learning supercomputer. In IEEE/ACM Int. Symp. on Microarchitecture, 602--622, 2014.
    [16]
    A. Krizhevsky, et al. ImageNet classification with deep convolutional neural networks. In NIPS, 1097--1105, 2012.
    [17]
    K. Simonyan and A. Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
    [18]
    Y.L. Boureau, et al. A Theoretical Analysis of Feature Pooling in Visual Recognition. In Int. Conf. on Machine Learning, 2010.
    [19]
    M. Denil, et al. Predicting parameters in deep learning. In NIPS, 2148--2156, 2013.
    [20]
    Y. Jia, et al. Caffe: Convolutional architecture for fast feature embedding. arXiv:1408.5093.
    [21]
    Khronos OpenCL Working Group. The OpenCL Specification, version 1.1.44, 2011.
    [22]
    M. S. Abdelfattah, et al. Gzip on a chip: high performance lossless data compression on FPGAs using OpenCL. In Int. Workshop on OpenCL 2014.
    [23]
    K. Chellapilla, S. Puri and P. Simard. High performance convolutional neural networks for document processing. In Int. Workshop on Frontiers in Handwriting Recognition, 2006.
    [24]
    Altera OpenCL design examples. Available online at https://www.altera.com/support/support-resources/design-examples/design-software/opencl.html
    [25]
    Nallatech P395-D8 OpenCL FPGA accelerator cards. http://www.nallatech.com/wp-content/uploads/openclcardspb_v1_51.pdf
    [26]
    DE5-Net FPGA kit user manual. Available online at ftp://ftp.altera.com/up/pub/Altera_Material/Boards/DE5/DE5_User_Manual.pdf
    [27]
    R.C. Whaley and J.J. Dongarra. Automatically tuned linear algebra software. In Proc. SuperComputing 1998: High Performance Networking and Computing, 2001.

    Cited By

    View all
    • (2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
    • (2024)A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAsACM Transactions on Design Automation of Electronic Systems10.1145/365285529:3(1-25)Online publication date: 16-Mar-2024
    • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
    • Show More Cited By

    Index Terms

    1. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      FPGA '16: Proceedings of the 2016 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
      February 2016
      298 pages
      ISBN:9781450338561
      DOI:10.1145/2847263
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 21 February 2016

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. convolutional neural networks
      2. fpga
      3. opencl
      4. optimization

      Qualifiers

      • Research-article

      Conference

      FPGA'16
      Sponsor:

      Acceptance Rates

      FPGA '16 Paper Acceptance Rate 20 of 111 submissions, 18%;
      Overall Acceptance Rate 125 of 627 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)304
      • Downloads (Last 6 weeks)20
      Reflects downloads up to 27 Jul 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Quantization-Based Optimization Algorithm for Hardware Implementation of Convolution Neural NetworksElectronics10.3390/electronics1309172713:9(1727)Online publication date: 30-Apr-2024
      • (2024)A High-Performance Accelerator for Real-Time Super-Resolution on Edge FPGAsACM Transactions on Design Automation of Electronic Systems10.1145/365285529:3(1-25)Online publication date: 16-Mar-2024
      • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
      • (2024)Advancing machine learning tasks with field-programmable gate arrays: advantages, applications, challenges, and future perspectivesSecond International Conference on Electrical, Electronics, and Information Engineering (EEIE 2023)10.1117/12.3017278(26)Online publication date: 15-Jan-2024
      • (2024)EDCompress: Energy-Aware Model Compression for DataflowsIEEE Transactions on Neural Networks and Learning Systems10.1109/TNNLS.2022.317294135:1(208-220)Online publication date: Jan-2024
      • (2024)Towards Secure Runtime Customizable Trusted Execution Environment on FPGA-SoCIEEE Transactions on Computers10.1109/TC.2024.335577273:4(1138-1151)Online publication date: Apr-2024
      • (2024)RAMAN: A Reconfigurable and Sparse tinyML Accelerator for Inference on EdgeIEEE Internet of Things Journal10.1109/JIOT.2024.338683211:14(24831-24845)Online publication date: 15-Jul-2024
      • (2024)Bayesian Inference Accelerator for Spiking Neural Networks2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558608(1-5)Online publication date: 19-May-2024
      • (2024)A Convolutional Neural Network Accelerator with High-level Synthesis2024 IEEE 4th International Conference on Electronic Technology, Communication and Information (ICETCI)10.1109/ICETCI61221.2024.10594247(38-43)Online publication date: 24-May-2024
      • (2024)A Precision-Scalable RISC-V DNN Processor with On-Device Learning Capability at the Extreme EdgeProceedings of the 29th Asia and South Pacific Design Automation Conference10.1109/ASP-DAC58780.2024.10473817(927-932)Online publication date: 22-Jan-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media