Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3204919.3204920acmotherconferencesArticle/Chapter ViewAbstractPublication PagesiwoclConference Proceedingsconference-collections
research-article

Performance-oriented Optimizations for OpenCL Streaming Kernels on the FPGA

Published: 14 May 2018 Publication History

Abstract

When Field-programmable gate arrays (FPGAs) can implement streaming applications efficiently and high-level synthesis (HLS) tools allow people, who have little hardware design knowledge, to evaluate an application on FPGAs, there is a need to understand where OpenCL and FPGA can play in the streaming domains. To this end, we explore the implementation space and discuss the techniques of optimizing the performance of the streaming kernels using the Intel OpenCL SDK for FPGA. On the Nallatech 385A FPGA platform that features an Arria 10 GX1150 FPGA, the experimental results show that FPGA resources, such as block RAMs and DSPs, can limit the performance of a kernel before the constraint of memory bandwidth takes effect. Kernel vectorization and compute unit duplication are practical optimization techniques that can improve the kernel performance by a factor of 2.8 to 10. The combination of the two techniques can improve the performance by a factor of 3.3 to 16, achieving the highest performance. To improve the performance of streaming kernels with compute unit duplication, the local work size needs to be tuned. The optimal value can increase the performance of a duplicated kernel without tuning by a factor of 3 to 70.

References

[1]
Koch, D., Hannig, F. and Ziener, D. eds., 2016. FPGAs for Software Programmers. Springer.
[2]
Kudlur, M. and Mahlke, S., 2008, June. Orchestrating the execution of stream programs on multicore platforms. In ACM SIGPLAN Notices (Vol. 43, No. 6, pp. 114--124). ACM.
[3]
Hagiescu, A., Wong, W.F., Bacon, D.F. and Rabbah, R., 2009, July. A computing origami: folding streams in FPGAs. In Proceedings of the 46th Annual Design Automation Conference (pp. 282--287). ACM.
[4]
Cong, J., Huang, M. and Zhang, P., 2014, February. Combining computation and communication optimizations in system synthesis for streaming applications. In Proceedings of the 2014 ACM/SIGDA international symposium on Field-programmable gate arrays (pp. 213--222). ACM.
[5]
Cong, J., Huang, M., Liu, B., Zhang, P. and Zou, Y., 2012, March. Combining module selection and replication for throughput-driven streaming programs. In Proceedings of the Conference on Design, Automation and Test in Europe (pp. 1018--1023). EDA Consortium.
[6]
Huynh, H.P., Hagiescu, A., Wong, W.F. and Goh, R.S.M., 2012, February. Scalable framework for mapping streaming applications onto multi-GPU systems. In ACM Sigplan Notices (Vol. 47, No. 8, pp. 1--10). ACM.
[7]
Nguyen, D. and Lee, J., 2016, February. Communication-aware mapping of stream graphs for multi-GPU platforms. In Proceedings of the 2016 International Symposium on Code Generation and Optimization (pp. 94--104). ACM.
[8]
Kuon, I., Tessier, R. and Rose, J., 2008. FPGA architecture: Survey and challenges. Foundations and Trends in Electronic Design Automation, 2(2), pp. 135--253.
[9]
Czajkowski, T.S., Aydonat, U., Denisenko, D., Freeman, J., Kinsner, M., Neto, D., Wong, J., Yiannacouras, P. and Singh, D.P., 2012, August. From OpenCL to high-performance hardware on FPGAs. In Field Programmable Logic and Applications (FPL), 2012 22nd International Conference on (pp. 531--534). IEEE.
[10]
Intel FPGA SDK for OpenCL Cyclone V SoC Getting Started Guide. Intel (2017)
[11]
Intel FPGA SDK for OpenCL Stratix V Network Reference Platform Porting Guide. Intel (2017)
[12]
Intel FPGA SDK for OpenCL Arria 10 GX FPGA Development Kit Reference Platform Porting Guide. Intel (2017)
[13]
Loring Wirbel: Xilinx SDAccel Whitepaper. Xilinx (2014)
[14]
Rul, S., Vandierendonck, H., D'Haene, J. and De Bosschere, K., 2010. An experimental study on performance portability of OpenCL kernels. In 2010 Symposium on Application Accelerators in High Performance Computing (SAAHPC'10).
[15]
Morales, V.M., Horrein, P.H., Baghdadi, A., Hochapfel, E. and Vaton, S., 2014, March. Energy-efficient FPGA implementation for binomial option pricing using OpenCL. In Proceedings of the conference on Design, Automation & Test in Europe (p. 208). European Design and Automation Association.
[16]
Verma, A., Helal, A.E., Krommydas, K. and Feng, W.C., 2016. Accelerating Workloads on FPGAs via OpenCL: A Case Study with OpenDwarfs. Department of Computer Science, Virginia Polytechnic Institute & State University.
[17]
Gao, S. and Chritz, J., 2014, December. Characterization of OpenCL on a scalable FPGA architecture. In ReConFigurable Computing and FPGAs (ReConFig), 2014 International Conference on (pp. 1--6). IEEE.
[18]
Jia, Q. and Zhou, H., 2016, October. Tuning Stencil codes in OpenCL for FPGAs. In Computer Design (ICCD), 2016 IEEE 34th International Conference on (pp. 249--256). IEEE.
[19]
Intel FPGA SDK for OpenCL Programming Guide. UG-OCL002. 2017.05.08
[20]
Freed, N. and Borenstein, N., 1996. Multipurpose internet mail extensions (MIME)
[21]
Liu, Z. and Ganesh, A.R.M., 2011. OpenCL-AES.
[22]
http://www.github.com/softboysxp/OpenCL-AES.
[23]
Wikipedia webpage, https://en.wikipedia.org/wiki/Geographical_distance
[24]
GpsDrive Homepage, http://www.gpsdrive.de/
[25]
Geographiclib Homepage, https://geographiclib.sourceforge.io/2009-03/geodesic.html
[26]
Vincenty, T., 1975. Direct and inverse solutions of geodesics on the ellipsoid with application of nested equations. Survey review, 23(176), pp. 88--93.

Cited By

View all
  • (2020)Performance Evaluation of the Vectorizable Binary Search Algorithms on an FPGA Platform2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3)10.1109/IA351965.2020.00014(63-67)Online publication date: Nov-2020
  • (2020)Design and Performance Evaluation of Optimizations for OpenCL FPGA Kernels2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286221(1-7)Online publication date: 22-Sep-2020
  • (2019)Simulation of Random Network of Hodgkin and Huxley Neurons with Exponential Synaptic Conductances on an FPGA PlatformProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3343460(653-657)Online publication date: 4-Sep-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
IWOCL '18: Proceedings of the International Workshop on OpenCL
May 2018
108 pages
ISBN:9781450364393
DOI:10.1145/3204919
Publication rights licensed to ACM. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

In-Cooperation

  • Huawei Technologies Co. Ltd.: Huawei Technologies Co. Ltd.
  • Khronos: Khronos Group
  • Xilinx: Xilinx Inc.
  • Codeplay: Codeplay Software Ltd.
  • Intel: Intel
  • The University of Bristol: The University of Bristol

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. OpenCL
  3. Streaming kernels

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

IWOCL '18
IWOCL '18: International Workshop on OpenCL
May 14 - 16, 2018
Oxford, United Kingdom

Acceptance Rates

IWOCL '18 Paper Acceptance Rate 16 of 33 submissions, 48%;
Overall Acceptance Rate 84 of 152 submissions, 55%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 07 Mar 2025

Other Metrics

Citations

Cited By

View all
  • (2020)Performance Evaluation of the Vectorizable Binary Search Algorithms on an FPGA Platform2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3)10.1109/IA351965.2020.00014(63-67)Online publication date: Nov-2020
  • (2020)Design and Performance Evaluation of Optimizations for OpenCL FPGA Kernels2020 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC43674.2020.9286221(1-7)Online publication date: 22-Sep-2020
  • (2019)Simulation of Random Network of Hodgkin and Huxley Neurons with Exponential Synaptic Conductances on an FPGA PlatformProceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics10.1145/3307339.3343460(653-657)Online publication date: 4-Sep-2019
  • (2019)PPOpenCL: a performance-portable OpenCL compiler with host and kernel thread code fusionProceedings of the 28th International Conference on Compiler Construction10.1145/3302516.3307350(2-16)Online publication date: 16-Feb-2019
  • (2019)Base64 Encoding on Heterogeneous Computing Platforms2019 IEEE 30th International Conference on Application-specific Systems, Architectures and Processors (ASAP)10.1109/ASAP.2019.00014(247-254)Online publication date: Jul-2019
  • (2018)Bob Jenkins Lookup3 Hash Function on OpenCL FPGA Platform2018 IEEE International Conference on Big Data (Big Data)10.1109/BigData.2018.8621960(4736-4741)Online publication date: Dec-2018

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media