Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3174243.3174255acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article
Public Access

Rosetta: A Realistic High-Level Synthesis Benchmark Suite for Software Programmable FPGAs

Published: 15 February 2018 Publication History
  • Get Citation Alerts
  • Abstract

    Modern high-level synthesis (HLS) tools greatly reduce the turn-around time of designing and implementing complex FPGA-based accelerators. They also expose various optimization opportunities, which cannot be easily explored at the register-transfer level. With the increasing adoption of the HLS design methodology and continued advances of synthesis optimization, there is a growing need for realistic benchmarks to (1) facilitate comparisons between tools, (2) evaluate and stress-test new synthesis techniques, and (3) establish meaningful performance baselines to track progress of the HLS technology. While several HLS benchmark suites already exist, they are primarily comprised of small textbook-style function kernels, instead of complete and complex applications. To address this limitation, we introduce Rosetta, a realistic benchmark suite for software programmable FPGAs. Designs in Rosetta are fully-developed applications. They are associated with realistic performance constraints, and optimized with advanced features of modern HLS tools. We believe that Rosetta is not only useful for the HLS research community, but can also serve as a set of design tutorials for non-expert HLS users. In this paper we describe the characteristics of our benchmarks and the optimization techniques applied to them. We further report experimental results on an embedded FPGA device as well as a cloud FPGA platform.

    References

    [1]
    Amazon Web Services. AWS FPGA Developer AMI. https://aws. amazon. com/marketplace/pp/B06VVYBLZZ, Dec 2017.
    [2]
    Amazon Web Services. AWS Shell Interface Specification. https://github. com/aws/aws-fpga/blob/master/hdk/docs/AWS_Shell_Interface_Specification.md, Dec 2017.
    [3]
    U. Aydonat, S. O'Connell, D. Capalija, A. C. Ling, and G. R. Chiu. An OpenCL Deep Learning Accelerator on Arria 10. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [4]
    D. J. Butler, J. Wulff, G. B. Stanley, and M. J. Black. A Naturalistic Open Source Movie for Optical Flow Evaluation. European Conference on Computer Vision (ECCV), Oct 2012.
    [5]
    S. Che, M. Boyer, J. Meng, D. Tarjan, J. W. Sheaffer, S.-H. Lee, and K. Skadron. Rodinia: A Benchmark Suite for Heterogeneous Computing. Int'l Symp. on Workload Characterization (IISWC), Oct 2009.
    [6]
    P. Colangelo, R. Huang, E. Luebbers, M. Margala, and K. Nealis. Fine-Grained Acceleration of Binary Neural Networks Using Intel Xeon Processor with Integrated FPGA. Int'l Symp. on Field-Programmable Custom Computing Machines (FCCM), Apr/May 2017.
    [7]
    J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 30(4):473--491, 2011.
    [8]
    M. Courbariaux, I. Hubara, D. Soudry, R. El-Yaniv, and Y. Bengio. Binarized Neural Networks: Training Deep Neural Networks with Weights and Activations Constrained to
    [9]
    1 or -1. arXiv preprint arXiv:1602.02830, Mar 2016.
    [10]
    S. Dai, R. Zhao, G. Liu, S. Srinath, U. Gupta, C. Batten, and Z. Zhang. Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [11]
    Q. Gautier, A. Althoff, P. Meng, and R. Kastner. Spector: An OpenCL FPGA Benchmark Suite. Int'l Conf. on Field Programmable Technology (FPT), Dec 2016.
    [12]
    Y. Hara, H. Tomiyama, S. Honda, and H. Takada. Proposal and Quantitative Analysis of the CHStone Benchmark Program Suite for Practical C-Based High-Level Synthesis. Journal of Information Processing, Vol. 17, pages 242--254, Oct 2008.
    [13]
    A. Krizhevsky and G. Hinton. Learning Multiple Layers of Features from Tiny Images. Technical report, University of Toronto, Apr 2009.
    [14]
    Y. LeCun. The MNIST Database of Handwritten Digits. http://yann. lecun. com/exdb/mnist/, Dec 2017.
    [15]
    Y. Liang, K. Rupnow, Y. Li, D. Min, M. N. Do, and D. Chen. High-Level Synthesis: Productivity, Performance, and Software Constraints. Journal of Electrical and Computer Engineering, 2012:1:1--1:1, Jan 2012.
    [16]
    G. Liu, M. Tan, S. Dai, R. Zhao, and Z. Zhang. Architecture and Synthesis for Area-Efficient Pipelining of Irregular Loop Nests. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2017.
    [17]
    X. Liu, Y. Chen, T. Nguyen, S. Gurumani, K. Rupnow, and D. Chen. High Level Synthesis of Complex Applications: An H. 264 Video Decoder. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2016.
    [18]
    D. G. Lowe. Object Recognition from Local Scale-Invariant Features. Int'l Conf. on Computer Vision (ICCV), Oct 1999.
    [19]
    Y. Ma, Y. Cao, S. Vrudhula, and J.-s. Seo. Optimizing Loop Operation and Dataflow in FPGA Acceleration of Deep Convolutional Neural Networks. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [20]
    K. P. Murphy. Machine Learning: A Probabilistic Perspective. MIT Press, 2012.
    [21]
    J. Pineda. A Parallel Algorithm for Polygon Rasterization. ACM SIGGRAPH Computer Graphics, 22(4):17--20, 1988.
    [22]
    L.-N. Pouchet. Polybench: The Polyhedral Benchmark Suite. http://www. cs. ucla. edu/pouchet/software/polybench, Dec 2017.
    [23]
    L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2013.
    [24]
    B. Reagen, R. Adolf, Y. S. Shao, G.-Y. Wei, and D. Brooks. Machsuite: Benchmarks for Accelerator Design and Customized Architectures. Int'l Symp. on Workload Characterization (IISWC), Oct 2014.
    [25]
    Y. S. Shao, B. Reagen, G.-Y. Wei, and D. Brooks. Aladdin: A Pre-RTL, Power-Performance Accelerator Simulator Enabling Large Design Space Exploration of Customized Architectures. Int'l Symp. on Computer Architecture (ISCA), Jun 2014.
    [26]
    N. K. Srivastava, S. Dai, R. Manohar, and Z. Zhang. Accelerating Face Detection on Programmable SoC Using C-Based Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [27]
    The Apache Software Foundation. Public Corpus. http://spamassassin. apache. org/old/publiccorpus/, Apr 2017.
    [28]
    Y. Umuroglu, N. J. Fraser, G. Gambardella, M. Blott, P. Leong, M. Jahre, and K. Vissers. FINN: A Framework for Fast, Scalable Binarized Neural Network Inference. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [29]
    P. Viola, M. J. Jones, and D. Snow. Detecting Pedestrians using Patterns of Motion and Appearance. International Journal of Computer Vision, 63(2):153--161, Jul 2005.
    [30]
    S. Wang, Y. Liang, and W. Zhang. FlexCL: An Analytical Performance Model for OpenCL Workloads on Flexible FPGAs. Design Automation Conf. (DAC), Jun 2017.
    [31]
    Y. Wang, P. Li, and J. Cong. Theory and Algorithm for Generalized Memory Partitioning in High-Level Synthesis. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2014.
    [32]
    Z. Wang, B. He, W. Zhang, and S. Jiang. A Performance Analysis Framework for Optimizing OpenCL Applications on FPGAs. Int'l Symp. on High Performance Computer Architecture (HPCA), Mar 2016.
    [33]
    Z. Wei, L. Dah-Jye, and B. E. Nelson. FPGA-Based Real-Time Optical Flow Algorithm Design and Implementation. Journal of Multimedia, 2:38--45, Sep 2007.
    [34]
    H. Yonekawa and H. Nakahara. On-Chip Memory Based Binarized Convolutional Deep Neural Network Applying Batch Normalization Free Technique on an FPGA. Int'l Parallel and Distributed Processing Symp. Workshops (IPDPSW), May 2017.
    [35]
    C. Zhang, P. Li, G. Sun, Y. Guan, B. Xiao, and J. Cong. Optimizing FPGA-Based Accelerator Design for Deep Convolutional Neural Networks. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2015.
    [36]
    C. Zhang and V. K. Prasanna. Frequency Domain Acceleration of Convolutional Neural Networks on CPU-FPGA Shared Memory System. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [37]
    J. Zhang and J. Li. Improving the Performance of OpenCL-Based FPGA Accelerator for Convolutional Neural Network. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [38]
    Z. Zhang and B. Liu. SDC-Based Modulo Scheduling for Pipeline Synthesis. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2013.
    [39]
    J. Zhao, L. Feng, S. Sharad, W. Zhang, Y. Liang, and B. He. COMBA: A Comprehensive Model-Based Analysis Framework for High Level Synthesis of Real Applications. Int'l Conf. on Computer-Aided Design (ICCAD), Nov 2017.
    [40]
    R. Zhao, W. Song, W. Zhang, T. Xing, J.-H. Lin, M. B. Srivastava, R. Gupta, and Z. Zhang. Accelerating Binarized Convolutional Neural Networks with Software-Programmable FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [41]
    G. Zhong, A. Prakash, Y. Liang, T. Mitra, and S. Niar. Lin-Analyzer: A High-Level Performance Analysis Tool for FPGA-Based Accelerators. Design Automation Conf. (DAC), Jun 2016.
    [42]
    Y. Zhou, K. M. Al-Hawaj, and Z. Zhang. A New Approach to Automatic Memory Banking using Trace-Based Address Mining. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2017.
    [43]
    W. Zuo, P. Li, D. Chen, L.-N. Pouchet, S. Zhong, and J. Cong. Improving Polyhedral Code Generation for High-Level Synthesis. Proc. of the 8th Int. Conf. on Hardware/Software Codesign and System Synthesis (CODES
    [44]
    ISSS), Sep/Oct 2013.

    Cited By

    View all
    • (2024)A Flexible-Granularity Task Graph Representation and Its Generation from C Applications (WIP)Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657580(178-182)Online publication date: 20-Jun-2024
    • (2024)Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine LearningProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658738(170-176)Online publication date: 12-Jun-2024
    • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    FPGA '18: Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
    February 2018
    310 pages
    ISBN:9781450356145
    DOI:10.1145/3174243
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 15 February 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. benchmarking
    2. fpga
    3. heterogeneous computing
    4. high-level synthesis
    5. reconfigurable computing

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    FPGA '18
    Sponsor:

    Acceptance Rates

    FPGA '18 Paper Acceptance Rate 10 of 116 submissions, 9%;
    Overall Acceptance Rate 125 of 627 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)271
    • Downloads (Last 6 weeks)24
    Reflects downloads up to 11 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A Flexible-Granularity Task Graph Representation and Its Generation from C Applications (WIP)Proceedings of the 25th ACM SIGPLAN/SIGBED International Conference on Languages, Compilers, and Tools for Embedded Systems10.1145/3652032.3657580(178-182)Online publication date: 20-Jun-2024
    • (2024)Skip the Benchmark: Generating System-Level High-Level Synthesis Data using Generative Machine LearningProceedings of the Great Lakes Symposium on VLSI 202410.1145/3649476.3658738(170-176)Online publication date: 12-Jun-2024
    • (2024)Architectural Support for Sharing, Isolating and Virtualizing FPGA ResourcesACM Transactions on Architecture and Code Optimization10.1145/364847521:2(1-26)Online publication date: 21-May-2024
    • (2024)A Unified Memory Dependency Framework for Speculative High-Level SynthesisProceedings of the 33rd ACM SIGPLAN International Conference on Compiler Construction10.1145/3640537.3641581(13-25)Online publication date: 17-Feb-2024
    • (2024)REFINE: Runtime Execution Feedback for INcremental Evolution on FPGA DesignsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637560(108-118)Online publication date: 1-Apr-2024
    • (2024)Invited Paper: Software/Hardware Co-design for LLM and Its Application for Design Verification2024 29th Asia and South Pacific Design Automation Conference (ASP-DAC)10.1109/ASP-DAC58780.2024.10473893(435-441)Online publication date: 22-Jan-2024
    • (2024)Enabling Secure and Efficient Sharing of Accelerators in Expeditionary SystemsJournal of Hardware and Systems Security10.1007/s41635-024-00148-4Online publication date: 8-May-2024
    • (2023)Constraint-Aware Multi-Technique Approximate High-Level Synthesis for FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/362448116:4(1-28)Online publication date: 9-Oct-2023
    • (2023)ExHiPR: Extended High-Level Partial Reconfiguration for Fast Incremental FPGA CompilationACM Transactions on Reconfigurable Technology and Systems10.1145/361783717:2(1-28)Online publication date: 14-Sep-2023
    • (2023)Leveraging Hardware Probes and Optimizations for Accelerating Fuzz Testing of Heterogeneous ApplicationsProceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering10.1145/3611643.3616318(1101-1113)Online publication date: 30-Nov-2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media