research-article

Public Access

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs

Authors:

Nitish Srivastava,

Joseph Featherston,

Gustavo Angarita Velasquez,

Zhiru ZhangAuthors Info & Claims

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 269 - 278

https://doi.org/10.1145/3174243.3174255

Published: 15 February 2018 Publication History

Abstract

Modern high-level synthesis (HLS) tools greatly reduce the turn-around time of designing and implementing complex FPGA-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at the register-transfer level. With the increasing adoption of the HLS design methodology and continued advances of synthesis optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate and stress-test new synthesis techniques, and (3) establish meaningful performance baselines to track progress of the HLS technology. While several HLS benchmark suites already exist, they are primarily comprised of small textbook-style function kernels, instead of complete and complex applications. To address this limitation, we introduce Rosetta, a realistic benchmark suite for software programmable FPGAs. Designs in Rosetta are fully-developed applications. They are associated with realistic performance constraints, and optimized with advanced features of modern HLS tools. We believe that Rosetta is not only useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users. In this paper we describe the characteristics of our benchmarks and the optimization techniques applied to them. We further report experimental results on an embedded FPGA device as well as a cloud FPGA platform.

References

[1]

Amazon Web Services. AWS FPGA Developer AMI. https://aws. amazon. com/marketplace/pp/B06VVYBLZZ, Dec 2017.

[2]

Amazon Web Services. AWS Shell Interface Specification. https://github. com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md, Dec 2017.

[3]

U. Aydonat, S. O'Connell, D. Capalija, A. C. Ling, and G. R. Chiu. An OpenCL Deep Learning Accelerator on Arria 10. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[4]

D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A Naturalistic Open Source Movie for Optical Flow Evaluation. European Conference on Computer Vision (ECCV), Oct 2012.

Digital Library

[5]

S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. Int'l Symp. on Workload Characterization (IISWC), Oct 2009.

Digital Library

[6]

P. Colangelo, R. Huang, E. Luebbers, M. Margala, and K. Nealis. Fine-Grained Acceleration of Binary Neural Networks Using Intel Xeon Processor with Integrated FPGA. Int'l Symp. on Field-Programmable Custom Computing Machines (FCCM), Apr/May 2017.

[7]

J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(4):473--491, 2011.

Digital Library

[8]

M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to

[9]

1 or -1. arXiv preprint arXiv:1602.02830, Mar 2016.

[10]

S. Dai, R. Zhao, G. Liu, S. Srinath, U. Gupta, C. Batten, and Z. Zhang. Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[11]

Q. Gautier, A. Althoff, P. Meng, and R. Kastner. Spector: An OpenCL FPGA Benchmark Suite. Int'l Conf. on Field Programmable Technology (FPT), Dec 2016.

[12]

Y. Hara, H. Tomiyama, S. Honda, and H. Takada. Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-Based High-Level Synthesis. Journal of Information Processing, Vol. 17, pages 242--254, Oct 2008.

[13]

A. Krizhevsky and G. Hinton. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, Apr 2009.

[14]

Y. LeCun. The MNIST Database of Handwritten Digits. http://yann. lecun. com/exdb/mnist/, Dec 2017.

[15]

Y. Liang, K. Rupnow, Y. Li, D. Min, M. N. Do, and D. Chen. High-Level Synthesis: Productivity, Performance, and Software Constraints. Journal of Electrical and Computer Engineering, 2012:1:1--1:1, Jan 2012.

Digital Library

[16]

G. Liu, M. Tan, S. Dai, R. Zhao, and Z. Zhang. Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2017.

[17]

X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen. High Level Synthesis of Complex Applications: An H. 264 Video Decoder. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2016.

Digital Library

[18]

D. G. Lowe. Object Recognition from Local Scale-Invariant Features. Int'l Conf. on Computer Vision (ICCV), Oct 1999.

Digital Library

[19]

Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[20]

K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.

Digital Library

[21]

J. Pineda. A Parallel Algorithm for Polygon Rasterization. ACM SIGGRAPH Computer Graphics, 22(4):17--20, 1988.

Digital Library

[22]

L.-N. Pouchet. Polybench: The Polyhedral Benchmark Suite. http://www. cs. ucla. edu/pouchet/software/polybench, Dec 2017.

[23]

L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2013.

Digital Library

[24]

B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks. Machsuite: Benchmarks for Accelerator Design and Customized Architectures. Int'l Symp. on Workload Characterization (IISWC), Oct 2014.

[25]

Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. Int'l Symp. on Computer Architecture (ISCA), Jun 2014.

Digital Library

[26]

N. K. Srivastava, S. Dai, R. Manohar, and Z. Zhang. Accelerating Face Detection on Programmable SoC Using C-Based Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[27]

The Apache Software Foundation. Public Corpus. http://spamassassin. apache. org/old/publiccorpus/, Apr 2017.

[28]

Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[29]

P. Viola, M. J. Jones, and D. Snow. Detecting Pedestrians using Patterns of Motion and Appearance. International Journal of Computer Vision, 63(2):153--161, Jul 2005.

Digital Library

[30]

S. Wang, Y. Liang, and W. Zhang. FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs. Design Automation Conf. (DAC), Jun 2017.

Digital Library

[31]

Y. Wang, P. Li, and J. Cong. Theory and Algorithm for Generalized Memory Partitioning in High-Level Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2014.

Digital Library

[32]

Z. Wang, B. He, W. Zhang, and S. Jiang. A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs. Int'l Symp. on High Performance Computer Architecture (HPCA), Mar 2016.

[33]

Z. Wei, L. Dah-Jye, and B. E. Nelson. FPGA-Based Real-Time Optical Flow Algorithm Design and Implementation. Journal of Multimedia, 2:38--45, Sep 2007.

[34]

H. Yonekawa and H. Nakahara. On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. Int'l Parallel and Distributed Processing Symp. Workshops (IPDPSW), May 2017.

[35]

C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2015.

Digital Library

[36]

C. Zhang and V. K. Prasanna. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[37]

J. Zhang and J. Li. Improving the Performance of OpenCL-Based FPGA Accelerator for Convolutional Neural Network. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[38]

Z. Zhang and B. Liu. SDC-Based Modulo Scheduling for Pipeline Synthesis. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2013.

Digital Library

[39]

J. Zhao, L. Feng, S. Sharad, W. Zhang, Y. Liang, and B. He. COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2017.

[40]

R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. B. Srivastava, R. Gupta, and Z. Zhang. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[41]

G. Zhong, A. Prakash, Y. Liang, T. Mitra, and S. Niar. Lin-Analyzer: A High-Level Performance Analysis Tool for FPGA-Based Accelerators. Design Automation Conf. (DAC), Jun 2016.

Digital Library

[42]

Y. Zhou, K. M. Al-Hawaj, and Z. Zhang. A New Approach to Automatic Memory Banking using Trace-Based Address Mining. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.

Digital Library

[43]

W. Zuo, P. Li, D. Chen, L.-N. Pouchet, S. Zhong, and J. Cong. Improving Polyhedral Code Generation for High-Level Synthesis. Proc. of the 8th Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES

Digital Library

[44]

ISSS), Sep/Oct 2013.

Cited By

Santos TBispo JCardoso JShrivastava ASui Y(2024)A Flexible-Granularity Task Graph Representation and Its Generation from C Applications (WIP)Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657580(178-182)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652032.3657580
Liao YAdegbija TLysecky RTandon R(2024)Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine LearningProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658738(170-176)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658738
Miliadis PTheodoropoulos DPnevmatikatos DKoziris N(2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3648475
Show More Cited By

Index Terms

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Heterogeneous (hybrid) systems
      2. Reconfigurable computing
2. Hardware
  1. Electronic design automation

Recommendations

From software to accelerators with LegUp high-level synthesis
CASES '13: Proceedings of the 2013 International Conference on Compilers, Architectures and Synthesis for Embedded Systems

Embedded system designers can achieve energy and performance benefits by using dedicated hardware accelerators. However, implementing custom hardware accelerators for an application can be difficult and time intensive. LegUp is an open-source high-level ...
Bit-level optimization for high-level synthesis and FPGA-based acceleration
FPGA '10: Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays

Automated hardware design from behavior-level abstraction has drawn wide interest in FPGA-based acceleration and configurable computing research field. However, for many high-level programming languages, such as C/C++, the description of bitwise access ...
Hardware Coprocessor Synthesis from an ANSI C Specification

Editor's note:This article shows how design space exploration can be realized through high-level synthesis. It presents a case study of a hardware implementation of the Advanced Encryption Standard (AES) Rijndael algorithm. Starting from the algorithmic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2018

310 pages

ISBN:9781450356145

DOI:10.1145/3174243

General Chair:
Jason H. Anderson
University of Toronto, Canada
,
Program Chair:
Kia Bazargan
University of Minnesota, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 15 February 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation
Defense Advanced Research Projects Agency

Conference

FPGA '18

Sponsor:

SIGDA

FPGA '18: The 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 25 - 27, 2018

CALIFORNIA, Monterey, USA

Acceptance Rates

FPGA '18 Paper Acceptance Rate 10 of 116 submissions, 9%;

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

75
Total Citations
View Citations
1,221
Total Downloads

Downloads (Last 12 months)271
Downloads (Last 6 weeks)24

Reflects downloads up to 11 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Santos TBispo JCardoso JShrivastava ASui Y(2024)A Flexible-Granularity Task Graph Representation and Its Generation from C Applications (WIP)Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657580(178-182)Online publication date: 20-Jun-2024
https://dl.acm.org/doi/10.1145/3652032.3657580
Liao YAdegbija TLysecky RTandon R(2024)Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine LearningProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658738(170-176)Online publication date: 12-Jun-2024
https://dl.acm.org/doi/10.1145/3649476.3658738
Miliadis PTheodoropoulos DPnevmatikatos DKoziris N(2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
https://dl.acm.org/doi/10.1145/3648475
Gorius JRokicki SDerrien SRodríguez GSadayappan PSukumaran-Rajam A(2024)A Unified Memory Dependency Framework for Speculative High-Level SynthesisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641581(13-25)Online publication date: 17-Feb-2024
https://dl.acm.org/doi/10.1145/3640537.3641581
Park DDeHon AZhang ZPutnam A(2024)REFINE: Runtime Execution Feedback for INcremental Evolution on FPGA DesignsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637560(108-118)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637560
Wan LHuang YLi YYe HWang JZhang XChen D(2024)Invited Paper: Software/Hardware Co-design for LLM and Its Application for Design Verification2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473893(435-441)Online publication date: 22-Jan-2024
https://doi.org/10.1109/ASP-DAC58780.2024.10473893
Malik AKarabulut EAwad AAysu A(2024)Enabling Secure and Efficient Sharing of Accelerators in Expeditionary SystemsJournal of Hardware and Systems Security10.1007/s41635-024-00148-4Online publication date: 8-May-2024
https://doi.org/10.1007/s41635-024-00148-4
Leipnitz MNazar G(2023)Constraint-Aware Multi-Technique Approximate High-Level Synthesis for FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/362448116:4(1-28)Online publication date: 9-Oct-2023
https://dl.acm.org/doi/10.1145/3624481
Xiao YPark DNiu ZHota ADehon A(2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
https://dl.acm.org/doi/10.1145/3617837
Wang JZhang QRong HXu GKim MChandra SBlincoe KTonella P(2023)Leveraging Hardware Probes and Optimizations for Accelerating Fuzz Testing of Heterogeneous ApplicationsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616318(1101-1113)Online publication date: 30-Nov-2023
https://dl.acm.org/doi/10.1145/3611643.3616318
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents