research-article

Public Access

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Authors:

Jeffrey S. Vetter,

Allen D. MalonyAuthors Info & Claims

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

Pages 160 - 171

https://doi.org/10.1145/3205289.3205324

Published: 12 June 2018 Publication History

Abstract

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. However, FPGAs have not been widely used for high-performance computing, primarily because of their programming complexity and difficulties in optimizing performance. In this paper, we present a directive-based, high-level optimization framework for high-performance computing with FPGAs, built on top of an OpenACC-to-FPGA translation framework called OpenARC. We propose directive extensions and corresponding compile-time optimization techniques to enable the compiler to generate more efficient FPGA hardware configuration files. Empirical evaluation of the proposed framework on an Intel Stratix V with five OpenACC benchmarks from various application domains shows that FPGA-specific optimizations can lead to significant increases in performance across all tested applications. We also demonstrate that applying these high-level directive-based optimizations can allow OpenACC applications to perform similarly to lower-level OpenCL applications with hand-written FPGA-specific optimizations, and offer runtime and power performance benefits compared to CPUs and GPUs.

References

[1]

{n. d.}. Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.

[2]

{n. d.}. Mentor - DK Design Suite. https://www.mentor.com/products/fpga/handel-c/dk-design-suite/.

[3]

{n. d.}. Xilinx - SDSoC Development Environment. https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html.

[4]

S. R. Alam, P. K. Agarwal, M. C. Smith, J. S. Vetter, and D. Caliga. 2007. Using FPGA Devices to Accelerate Biomolecular Simulations. IEEE Computer 40, 2 (2007), 66--73.

Digital Library

[5]

Amazon. {n. d.}. Amazon EC2 F1 Instances. ({n. d.}). https://aws.amazon.com/ec2/instance-types/f1/

[6]

Matthew Aubury, Ian Page, Geoff Randall, Jonathan Saul, and Robin Watts. 1996. Handel-C Language Reference Guide. Oxford University Computing Laboratory.

[7]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level Synthesis for FPGA-based Processor/Accelerator Systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). 33--36.

Digital Library

[8]

Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC).

Digital Library

[9]

Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, and Samuel Midkiff. 2009. Cetus: A Source-to-Source Compiler Infrastructure for Multicores. IEEE Computer 42, 12 (2009), 36--42. http://www.ecn.purdue.edu/ParaMount/publications/ieeecomputer-Cetus-09.pdf

Digital Library

[10]

Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-window Applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '12). 47--56.

Digital Library

[11]

L. Gao, D. Zaretsky, G. Mittal, D. Schonfeld, and P. Banerjee. 2009. A software pipelining algorithm in high-level synthesis for FPGA architectures. In 2009 10th International Symposium on Quality Electronic Design. 297--302.

Digital Library

[12]

Intel. {n. d.}. Altera FPGA-Based Storage Reference Design. ({n. d.}). https://newsroom.intel.com/news-releases/altera-fpga-based-storage-reference-design-doubles-life-nand-flash/

[13]

M. Lam. 1988. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI '88). 318--328.

Digital Library

[14]

S. Lee, J. Kim, and J. S. Vetter. 2016. OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS '16). 544--554.

[15]

Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 101--110.

Digital Library

[16]

Seyong Lee and Jeffrey Vetter. 2014. OpenARC: Open Accelerator Research Compiler for Directive-Based, Efficient Heterogeneous Computing. In HPDC '14: Proceedings of the ACM Symposium on High-Performance Parallel and Distributed Computing, Short Paper.

Digital Library

[17]

Preeti Ranjan Panda. 2001. SystemC: A Modeling Platform Supporting Multiple Design Abstractions. In Proceedings of the 14th International Symposium on Systems Synthesis (ISSS '01). 75--80.

Digital Library

[18]

Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 13--24. http://dl.acm.org/citation.cfm?id=2665671.2665678

Digital Library

[19]

Amit Sabne, Putt Sakdhnagool, Seyong Lee, and Jeffrey S. Vetter. 2015. Evaluating Performance Portability of OpenACC. In Languages and Compilers for Parallel Computing. 51--66.

[20]

S. Seo, G. Jo, and J. Lee. 2011. Performance characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE International Symposium on Workload Characterization (IISWC). 137--148.

Digital Library

[21]

M. C. Smith, J. S. Vetter, and X. Liang. 2005. Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis. In 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS '05).

Digital Library

[22]

Greg Stitt, Eric Schwartz, and Patrick Cooke. 2016. A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 23 (2016), 23:1--23:22 pages.

Digital Library

[23]

Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. SIGARCH Comput. Archit. News 45, 2 (June 2017), 13--26.

Digital Library

[24]

Ritchie Zhao, Mingxing Tan, Steve Dai, and Zhiru Zhang. 2015. Area-efficient Pipelining for FPGA-targeted High-level Synthesis. In Proceedings of the 52Nd Annual Design Automation Conference (DAC '15). Article 157, 157:1--157:6 pages.

Digital Library

[25]

Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, and Satoshi Matsuoka. 2016. Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). 35:1--35:12.

Digital Library

Cited By

Birath BErnstsson ATinnerholm JKessler C(2024)High-Level Programming of FPGA-Accelerated Systems with Parallel PatternsInternational Journal of Parallel Programming10.1007/s10766-024-00770-352:4(253-273)Online publication date: 27-May-2024
https://doi.org/10.1007/s10766-024-00770-3
Liang YXiao QLu LXie J(2022)FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310806541:8(2546-2559)Online publication date: Aug-2022
https://doi.org/10.1109/TCAD.2021.3108065
Cabrera AYoung ALambert JXiao ZAn ALee SJin ZKim JBuhler JChamberlain RVetter J(2021)Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAsProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456699(1-9)Online publication date: 27-Apr-2021
https://dl.acm.org/doi/10.1145/3456669.3456699
Show More Cited By

Index Terms

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Recommendations

Combined Spatial and Temporal Blocking for High-Performance Stencil Computation on FPGAs Using OpenCL
FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Recent developments in High Level Synthesis tools have attracted software programmers to accelerate their high-performance computing applications on FPGAs. Even though it has been shown that FPGAs can compete with GPUs in terms of performance for ...
Evaluation of a Directive-Based GPU Programming Approach for High-Order Unstructured Mesh Computational Fluid Dynamics
PASC '17: Proceedings of the Platform for Advanced Scientific Computing Conference

In this work we evaluate the effectiveness of using OpenACC as a paradigm for the auto-parallelization of a high-order unstructured CFD code on Graphics Processing Units (GPUs). This is in lieu of hand-written CUDA or OpenCL code for the algorithms that ...
Evaluating and optimizing OpenCL kernels for high performance computing with FPGAs
SC '16: Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis

We evaluate the power and performance of the Rodinia benchmark suite using the Altera SDK for OpenCL targeting a Stratix V FPGA against a modern CPU and GPU. We study multiple OpenCL kernels per benchmark, ranging from direct ports of the original GPU ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICS '18: Proceedings of the 2018 International Conference on Supercomputing

June 2018

407 pages

ISBN:9781450357838

DOI:10.1145/3205289

Copyright © 2018 ACM.

© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

SIGARCH: ACM Special Interest Group on Computer Architecture

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

U.S. Department of Energy

Conference

ICS '18

Sponsor:

SIGARCH

ICS '18: 2018 International Conference on Supercomputing

June 12 - 15, 2018

Beijing, China

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
361
Total Downloads

Downloads (Last 12 months)53
Downloads (Last 6 weeks)5

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Birath BErnstsson ATinnerholm JKessler C(2024)High-Level Programming of FPGA-Accelerated Systems with Parallel PatternsInternational Journal of Parallel Programming10.1007/s10766-024-00770-352:4(253-273)Online publication date: 27-May-2024
https://doi.org/10.1007/s10766-024-00770-3
Liang YXiao QLu LXie J(2022)FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310806541:8(2546-2559)Online publication date: Aug-2022
https://doi.org/10.1109/TCAD.2021.3108065
Cabrera AYoung ALambert JXiao ZAn ALee SJin ZKim JBuhler JChamberlain RVetter J(2021)Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAsProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456699(1-9)Online publication date: 27-Apr-2021
https://dl.acm.org/doi/10.1145/3456669.3456699
Cabrera AHitefield SKim JLee SMiniskar NVetter J(2021)Toward Performance Portable Programming for Heterogeneous Systems on a Chip: A Case Study with Qualcomm Snapdragon SoC2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622794(1-7)Online publication date: 20-Sep-2021
https://doi.org/10.1109/HPEC49654.2021.9622794
Lambert JLee SVetter JMalony A(2020)CCAMP: An Integrated Translation and optimization Framework for OpenACC and OpenMPSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00102(1-14)Online publication date: Nov-2020
https://doi.org/10.1109/SC41405.2020.00102
Lambert JLee SVetter JMalony A(2020)In-Depth Optimization with the OpenACC-to-FPGA Framework on an Arria 10 FPGA2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00084(460-470)Online publication date: May-2020
https://doi.org/10.1109/IPDPSW50202.2020.00084

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents