Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3205289.3205324acmconferencesArticle/Chapter ViewAbstractPublication PagesicsConference Proceedingsconference-collections
research-article
Public Access

Directive-Based, High-Level Programming and Optimizations for High-Performance Computing with FPGAs

Published: 12 June 2018 Publication History

Abstract

Reconfigurable architectures like Field Programmable Gate Arrays (FPGAs) have been used for accelerating computations from several domains because of their unique combination of flexibility, performance, and power efficiency. However, FPGAs have not been widely used for high-performance computing, primarily because of their programming complexity and difficulties in optimizing performance. In this paper, we present a directive-based, high-level optimization framework for high-performance computing with FPGAs, built on top of an OpenACC-to-FPGA translation framework called OpenARC. We propose directive extensions and corresponding compile-time optimization techniques to enable the compiler to generate more efficient FPGA hardware configuration files. Empirical evaluation of the proposed framework on an Intel Stratix V with five OpenACC benchmarks from various application domains shows that FPGA-specific optimizations can lead to significant increases in performance across all tested applications. We also demonstrate that applying these high-level directive-based optimizations can allow OpenACC applications to perform similarly to lower-level OpenCL applications with hand-written FPGA-specific optimizations, and offer runtime and power performance benefits compared to CPUs and GPUs.

References

[1]
{n. d.}. Intel FPGA SDK for OpenCL. https://www.altera.com/products/design-software/embedded-software-developers/opencl/overview.html.
[2]
{n. d.}. Mentor - DK Design Suite. https://www.mentor.com/products/fpga/handel-c/dk-design-suite/.
[3]
{n. d.}. Xilinx - SDSoC Development Environment. https://www.xilinx.com/products/design-tools/software-zone/sdsoc.html.
[4]
S. R. Alam, P. K. Agarwal, M. C. Smith, J. S. Vetter, and D. Caliga. 2007. Using FPGA Devices to Accelerate Biomolecular Simulations. IEEE Computer 40, 2 (2007), 66--73.
[5]
Amazon. {n. d.}. Amazon EC2 F1 Instances. ({n. d.}). https://aws.amazon.com/ec2/instance-types/f1/
[6]
Matthew Aubury, Ian Page, Geoff Randall, Jonathan Saul, and Robin Watts. 1996. Handel-C Language Reference Guide. Oxford University Computing Laboratory.
[7]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. 2011. LegUp: High-level Synthesis for FPGA-based Processor/Accelerator Systems. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '11). 33--36.
[8]
Shuai Che, Michael Boyer, Jiayuan Meng, David Tarjan, Jeremy W. Sheaffer, Sang ha Lee, and Kevin Skadron. 2009. Rodinia: A Benchmark Suite for Heterogeneous Computing. In Proceedings of the IEEE International Symposium on Workload Characterization (IISWC).
[9]
Chirag Dave, Hansang Bae, Seung-Jai Min, Seyong Lee, Rudolf Eigenmann, and Samuel Midkiff. 2009. Cetus: A Source-to-Source Compiler Infrastructure for Multicores. IEEE Computer 42, 12 (2009), 36--42. http://www.ecn.purdue.edu/ParaMount/publications/ieeecomputer-Cetus-09.pdf
[10]
Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. 2012. A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-window Applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays (FPGA '12). 47--56.
[11]
L. Gao, D. Zaretsky, G. Mittal, D. Schonfeld, and P. Banerjee. 2009. A software pipelining algorithm in high-level synthesis for FPGA architectures. In 2009 10th International Symposium on Quality Electronic Design. 297--302.
[12]
Intel. {n. d.}. Altera FPGA-Based Storage Reference Design. ({n. d.}). https://newsroom.intel.com/news-releases/altera-fpga-based-storage-reference-design-doubles-life-nand-flash/
[13]
M. Lam. 1988. Software Pipelining: An Effective Scheduling Technique for VLIW Machines. In Proceedings of the ACM SIGPLAN 1988 Conference on Programming Language Design and Implementation (PLDI '88). 318--328.
[14]
S. Lee, J. Kim, and J. S. Vetter. 2016. OpenACC to FPGA: A Framework for Directive-Based High-Performance Reconfigurable Computing. In Proceedings of the 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS '16). 544--554.
[15]
Seyong Lee, Seung-Jai Min, and Rudolf Eigenmann. 2009. OpenMP to GPGPU: A Compiler Framework for Automatic Translation and Optimization. In ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP). ACM, 101--110.
[16]
Seyong Lee and Jeffrey Vetter. 2014. OpenARC: Open Accelerator Research Compiler for Directive-Based, Efficient Heterogeneous Computing. In HPDC '14: Proceedings of the ACM Symposium on High-Performance Parallel and Distributed Computing, Short Paper.
[17]
Preeti Ranjan Panda. 2001. SystemC: A Modeling Platform Supporting Multiple Design Abstractions. In Proceedings of the 14th International Symposium on Systems Synthesis (ISSS '01). 75--80.
[18]
Andrew Putnam, Adrian M. Caulfield, Eric S. Chung, Derek Chiou, Kypros Constantinides, John Demme, Hadi Esmaeilzadeh, Jeremy Fowers, Gopi Prashanth Gopal, Jan Gray, Michael Haselman, Scott Hauck, Stephen Heil, Amir Hormati, Joo-Young Kim, Sitaram Lanka, James Larus, Eric Peterson, Simon Pope, Aaron Smith, Jason Thong, Phillip Yi Xiao, and Doug Burger. 2014. A Reconfigurable Fabric for Accelerating Large-scale Datacenter Services. In Proceeding of the 41st Annual International Symposium on Computer Architecuture (ISCA '14). IEEE Press, Piscataway, NJ, USA, 13--24. http://dl.acm.org/citation.cfm?id=2665671.2665678
[19]
Amit Sabne, Putt Sakdhnagool, Seyong Lee, and Jeffrey S. Vetter. 2015. Evaluating Performance Portability of OpenACC. In Languages and Compilers for Parallel Computing. 51--66.
[20]
S. Seo, G. Jo, and J. Lee. 2011. Performance characterization of the NAS Parallel Benchmarks in OpenCL. In 2011 IEEE International Symposium on Workload Characterization (IISWC). 137--148.
[21]
M. C. Smith, J. S. Vetter, and X. Liang. 2005. Accelerating Scientific Applications with the SRC-6 Reconfigurable Computer: Methodologies and Analysis. In 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS '05).
[22]
Greg Stitt, Eric Schwartz, and Patrick Cooke. 2016. A Parallel Sliding-Window Generator for High-Performance Digital-Signal Processing on FPGAs. ACM Trans. Reconfigurable Technol. Syst. 9, 3, Article 23 (2016), 23:1--23:22 pages.
[23]
Swagath Venkataramani, Ashish Ranjan, Subarno Banerjee, Dipankar Das, Sasikanth Avancha, Ashok Jagannathan, Ajaya Durg, Dheemanth Nagaraj, Bharat Kaul, Pradeep Dubey, and Anand Raghunathan. 2017. ScaleDeep: A Scalable Compute Architecture for Learning and Evaluating Deep Networks. SIGARCH Comput. Archit. News 45, 2 (June 2017), 13--26.
[24]
Ritchie Zhao, Mingxing Tan, Steve Dai, and Zhiru Zhang. 2015. Area-efficient Pipelining for FPGA-targeted High-level Synthesis. In Proceedings of the 52Nd Annual Design Automation Conference (DAC '15). Article 157, 157:1--157:6 pages.
[25]
Hamid Reza Zohouri, Naoya Maruyama, Aaron Smith, Motohiko Matsuda, and Satoshi Matsuoka. 2016. Evaluating and Optimizing OpenCL Kernels for High Performance Computing with FPGAs. In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis (SC '16). 35:1--35:12.

Cited By

View all
  • (2024)High-Level Programming of FPGA-Accelerated Systems with Parallel PatternsInternational Journal of Parallel Programming10.1007/s10766-024-00770-352:4(253-273)Online publication date: 27-May-2024
  • (2022)FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310806541:8(2546-2559)Online publication date: Aug-2022
  • (2021)Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAsProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456699(1-9)Online publication date: 27-Apr-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICS '18: Proceedings of the 2018 International Conference on Supercomputing
June 2018
407 pages
ISBN:9781450357838
DOI:10.1145/3205289
© 2018 Association for Computing Machinery. ACM acknowledges that this contribution was authored or co-authored by an employee, contractor or affiliate of the United States government. As such, the United States Government retains a nonexclusive, royalty-free right to publish or reproduce this article, or to allow others to do so, for Government purposes only.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 12 June 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA
  2. OpenACC
  3. OpenARC
  4. OpenCL
  5. directive-based programming
  6. reconfigurable computing
  7. sliding window

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Funding Sources

Conference

ICS '18
Sponsor:

Acceptance Rates

Overall Acceptance Rate 629 of 2,180 submissions, 29%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)53
  • Downloads (Last 6 weeks)5
Reflects downloads up to 26 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)High-Level Programming of FPGA-Accelerated Systems with Parallel PatternsInternational Journal of Parallel Programming10.1007/s10766-024-00770-352:4(253-273)Online publication date: 27-May-2024
  • (2022)FCNNLib: A Flexible Convolution Algorithm Library for Deep Learning on FPGAsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2021.310806541:8(2546-2559)Online publication date: Aug-2022
  • (2021)Toward Evaluating High-Level Synthesis Portability and Performance between Intel and Xilinx FPGAsProceedings of the 9th International Workshop on OpenCL10.1145/3456669.3456699(1-9)Online publication date: 27-Apr-2021
  • (2021)Toward Performance Portable Programming for Heterogeneous Systems on a Chip: A Case Study with Qualcomm Snapdragon SoC2021 IEEE High Performance Extreme Computing Conference (HPEC)10.1109/HPEC49654.2021.9622794(1-7)Online publication date: 20-Sep-2021
  • (2020)CCAMP: An Integrated Translation and optimization Framework for OpenACC and OpenMPSC20: International Conference for High Performance Computing, Networking, Storage and Analysis10.1109/SC41405.2020.00102(1-14)Online publication date: Nov-2020
  • (2020)In-Depth Optimization with the OpenACC-to-FPGA Framework on an Arria 10 FPGA2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)10.1109/IPDPSW50202.2020.00084(460-470)Online publication date: May-2020

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media