Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3490422.3502362acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

Finding and Finessing Static Islands in Dynamically Scheduled Circuits

Published: 11 February 2022 Publication History

Abstract

In high-level synthesis, scheduling is the process that determines the start time of each operation in hardware. A hardware design can be scheduled either at compile time (static), run time (dynamic), or both. Recent research has shown that combining dynamic and static scheduling can achieve high performance and small area. However, there is still a challenge to determine which part to schedule statically and which part dynamically. An inappropriate choice can lead to suboptimal design quality. This paper proposes a heuristic-driven approach to automatically determine 'static islands' - i.e., code regions that are amenable for static scheduling. Over a set of benchmarks where our approach is applicable, we show that our tool can achieve on average a 3.8-fold reduction in area combined with a 13% performance boost through automatic identification and synthesis of static islands from fully dynamically scheduled circuits. The performance of the resulting hardware is close to optimum (as determined by an exhaustive enumeration of all possible static islands).

Supplementary Material

MP4 File (FPGA22-fpgafp073.mp4)
Presentation video: "I don?t know hardware, and how can I get high performance hardware from the DASS HLS tool?" HLS tools ideally allow software engineers without hardware background to program custom hardware to achieve high performance. Still, it remains the case that automatically synthesizing high performance and area-efficient hardware from arbitrary high-level programs is challenging. A hardware design can be scheduled either at compile time (static), run time (dynamic), or both. The DASS HLS tool combines the best of two worlds, dynamic and static scheduling, to synthesize more efficient hardware. However, DASS requires the user to manually choose scheduling techniques for each part of the code, which contradicts the purpose of HLS tools. In this video, we demonstrate how we solve this problem and get rid of the pain in the DASS...

References

[1]
I. Ahmad, M. K. Dhodhi, and C. Y. R. Chen. 1995. Integrated scheduling, allocation and module selection for design-space exploration in high-level synthesis. IEE Proceedings - Computers and Digital Techniques, Vol. 142, 1 (1995), 65--71. https://doi.org/10.1049/ip-cdt:19951516
[2]
Mythri Alle, Antoine Morvan, and Steven Derrien. 2013. Runtime dependency analysis for loop pipelining in High-Level Synthesis. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, Austin, TX, 51:1--51:10 .
[3]
Mihai Budiu and Seth Copen Goldstein. 2002. Pegasus: An Efficient Intermediate Representation . Technical Report Carnegie Mellon University-CS-02--107. Carnegie Mellon University. 20 pages.
[4]
A. Canis, S. D. Brown, and J. H. Anderson. 2014. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In 2014 24th International Conference on Field Programmable Logic and Applications (FPL). 1--8. https://doi.org/10.1109/FPL.2014.6927490
[5]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Tomasz Czajkowski, Stephen D. Brown, and Jason H. Anderson. 2013. LegUp: An Open-source High-level Synthesis Tool for FPGA-based Processor/Accelerator Systems . ACM Trans. Embed. Comput. Syst., Vol. 13, 2, Article 24 (Sept. 2013), bibinfonumpages27 pages.
[6]
Luca P. Carloni. 2015. From Latency-Insensitive Design to Communication-Based System-Level Design. Proc. IEEE, Vol. 103, 11 (Nov 2015), 2133--2151. https://doi.org/10.1109/JPROC.2015.2480849
[7]
Luca P. Carloni, Kenneth L. McMillan, and Alberto L. Sangiovanni-Vincentelli. 2001. Theory of latency-insensitive design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 20, 9 (Sep. 2001), 1059--1076. https://doi.org/10.1109/43.945302
[8]
Luca P. Carloni and Alberto L. Sangiovanni-Vincentelli. 2000. Performance Analysis and Optimization of Latency Insensitive Systems. In Proceedings of the 37th Annual Design Automation Conference (Los Angeles, California, USA) (DAC '00). Association for Computing Machinery, New York, NY, USA, 361--367. https://doi.org/10.1145/337292.337441
[9]
Mario R. Casu and Luca Macchiarulo. 2004. A New Approach to Latency Insensitive Design. In Proceedings of the 41st Annual Design Automation Conference (San Diego, CA, USA) (DAC '04). Association for Computing Machinery, New York, NY, USA, 576--581. https://doi.org/10.1145/996566.996725
[10]
Catapult High-Level Synthesis. 2020. https://www.mentor.com/hls-lp/catapult-high-level-synthesis/
[11]
Celoxica. 2005. Handel-C . http://www.celoxica.com
[12]
J. Cheng, L. Josipović, G. A. Constantinides, P. Ienne, and J. Wickerson. 2021. DASS: Combining Dynamic and Static Scheduling in High-level Synthesis . IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2021), 1--1. https://doi.org/10.1109/TCAD.2021.3065902
[13]
Jianyi Cheng, John Wickerson, and George A. Constantinides. 2021. Probabilistic Scheduling in High-Level Synthesis. In 2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) . 195--203. https://doi.org/10.1109/FCCM51124.2021.00031
[14]
R. L. Collins and L. P. Carloni. 2008. Topology-Based Performance Analysis and Optimization of Latency-Insensitive Systems. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 27, 12 (2008), 2277--2290.
[15]
Jason Cong, Muhuan Huang, Bin Liu, Peng Zhang, and Yi Zou. 2012. Combining module selection and replication for throughput-driven streaming programs. In 2012 Design, Automation Test in Europe Conference Exhibition (DATE) . 1018--1023. https://doi.org/10.1109/DATE.2012.6176645
[16]
Jason Cong and Zhiru Zhang. 2006. An efficient and versatile scheduling algorithm based on SDC formulation. In 2006 43rd ACM/IEEE Design Automation Conference. IEEE, San Francisco, CA, 433--438.
[17]
CIRCT contributors. 2021. CIRCT: Circuit IR Compilers and Tools. https://github.com/llvm/circt/tree/main/.
[18]
Philippe Coussy, Daniel D. Gajski, Michael Meredith, and Andres Takach. 2009. An Introduction to High-Level Synthesis. IEEE Design Test of Computers, Vol. 26, 4 (July 2009), 8--17.
[19]
Steve Dai, Mingxing Tan, Kecheng Hao, and Zhiru Zhang. 2014. Flushing-enabled loop pipelining for high-level synthesis. In 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, San Francisco, CA, 1--6.
[20]
Steve Dai, Ritchie Zhao, Gai Liu, Shreesha Srinath, Udit Gupta, Christopher Batten, and Zhiru Zhang. 2017. Dynamic Hazard Resolution for Pipelining Irregular Loops in High-Level Synthesis. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '17). ACM, Monterey, CA, 189--194.
[21]
Johannes de Fine Licht, Grzegorz Kwasniewski, and Torsten Hoefler. 2020. Flexible Communication Avoiding Matrix Multiplication on FPGA with High-Level Synthesis. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 244--254. https://doi.org/10.1145/3373087.3375296
[22]
gram-schmidt. 2021. https://github.com/chrundle/gram-schmidt
[23]
Licheng Guo, Yuze Chi, Jie Wang, Jason Lau, Weikang Qiao, Ecenur Ustun, Zhiru Zhang, and Jason Cong. 2021. AutoBridge: Coupling Coarse-Grained Floorplanning and Pipelining for High-Frequency HLS Design on Multi-Die FPGAs. In The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Virtual Event, USA) (FPGA '21). Association for Computing Machinery, New York, NY, USA, 81--92. https://doi.org/10.1145/3431920.3439289
[24]
Yuko Hara, Hiroyuki Tomiyama, Shinya Honda, Hiroaki Takada, and Katsuya Ishii. 2008. CHStone: A benchmark program suite for practical C-based high-level synthesis. In 2008 IEEE International Symposium on Circuits and Systems . 1192--1195. https://doi.org/10.1109/ISCAS.2008.4541637
[25]
HLS Benchmarks. 2021. https://github.com/JianyiCheng/HLS-benchmarks/tree/master/StaticIslands
[26]
Ian Page and Wayne Luk. 1991. Compiling occam into Field-Programmable Gate Arrays. In FPGAs, W. Moore and W. Luk, Eds., Abingdon EE&CS Books.
[27]
Vincent John Mooney III and Giovanni De Micheli. 2000. Hardware/Software Co-Design of Run-Time Schedulers for Real-Time Systems. Design Automation for Embedded Systems, Vol. 6, 1 (01 Sep 2000), 89--144.
[28]
Intel HLS Compiler. 2021. https://www.intel.co.uk/content/www/uk/en/software/programmable/quartus-prime/hls-compiler.html
[29]
M. Ishikawa and G. De Micheli. 1991. A module selection algorithm for high-level synthesis. In 1991., IEEE International Sympoisum on Circuits and Systems. 1777--1780 vol.3. https://doi.org/10.1109/ISCAS.1991.176748
[30]
K. Ito, L. E. Lucke, and K. K. Parhi. 1998. ILP-based cost-optimal DSP synthesis with module selection and data format conversion. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, Vol. 6, 4 (1998), 582--594. https://doi.org/10.1109/92.736132
[31]
Lana Josipoviç, Philip Brisk, and Paolo Ienne. 2017. An Out-of-Order Load-Store Queue for Spatial Computing. ACM Trans. Embed. Comput. Syst., Vol. 16, 5s, Article 125 (Sept. 2017), bibinfonumpages19 pages.
[32]
Lana Josipović, Radhika Ghosal, and Paolo Ienne. 2018. Dynamically Scheduled High-level Synthesis. In Proceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA '18). ACM, Monterey, CA, 127--136.
[33]
L. Josipovic, A. Bhattacharyya, A. Guerrieri, and P. Ienne. 2019. Shrink It or Shed It! Minimize the Use of LSQs in Dataflow Designs. In 2019 International Conference on Field-Programmable Technology (ICFPT). 197--205.
[34]
M. Kulkarni, M. Burtscher, C. Cascaval, and K. Pingali. 2009. Lonestar: A suite of parallel irregular programs. In 2009 IEEE International Symposium on Performance Analysis of Systems and Software. 65--76. https://doi.org/10.1109/ISPASS.2009.4919639
[35]
levenberg-maquardt-example. 2021. https://github.com/leechwort/levenberg-maquardt-example
[36]
Junyi Liu, Samuel Bayliss, and George A. Constantinides. 2015. Offline Synthesis of Online Dependence Testing: Parametric Loop Pipelining for HLS. In 2015 IEEE 23rd Annual International Symposium on Field-Programmable Custom Computing Machines. IEEE, Vancouver, BC, 159--162.
[37]
Optimization Techniques in Vitis HLS. 2021. https://www.xilinx.com/html_docs/xilinx2021_1/vitis_doc/vitis_hls_optimization_techniques.html
[38]
Louis-Noël Pouchet et almbox. 2012. Polybench: The polyhedral benchmark suite. URL: http://www. cs. ucla. edu/pouchet/software/polybench, Vol. 437 (2012).
[39]
Zhiqiang Que, Erwei Wang, Umar Marikar, Eric Moreno, Jennifer Ngadiuba, Hamza Javed, Bart?omiej Borzyszkowski, Thea Aarrestad, Vladimir Loncar, Sioni Summers, Maurizio Pierini, Peter Y Cheung, and Wayne Luk. 2021. Accelerating Recurrent Neural Networks for Gravitational Wave Experiments. In 32th International Conference on Application-specific Systems, Architectures and Processors (ASAP). IEEE.
[40]
M. Singh and M. Theobald. 2004. Generalized latency-insensitive systems for single-clock and multi-clock architectures. In Proceedings Design, Automation and Test in Europe Conference and Exhibition, Vol. 2. 1008--1013 Vol.2.
[41]
Stratus High-Level Synthesis. 2021. https://www.cadence.com/en_US/home/tools/digital-design-and-signoff/synthesis/stratus-high-level-synthesis.html
[42]
Qiuyue Sun, Amir Taherin, Yawo Siatitse, and Yuhao Zhu. 2020. Energy-Efficient 360-Degree Video Rendering on FPGA via Algorithm-Architecture Co-Design. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 97--103. https://doi.org/10.1145/3373087.3375317
[43]
W. Sun, M. J. Wirthlin, and S. Neuendorffer. 2007. FPGA Pipeline Synthesis Design Exploration Using Module Selection and Resource Sharing. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 26, 2 (2007), 254--265. https://doi.org/10.1109/TCAD.2006.887923
[44]
Mingxing Tan, Gai Liu, Ritchie Zhao, Steve Dai, and Zhiru Zhang. 2015. ElasticFlow: A complexity-effective approach for pipelining irregular loop nests. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD) . IEEE, Austin, TX, 78--85.
[45]
Girish Venkataramani, Mihai Budiu, Tiberiu Chelcea, and Seth Copen Goldstein. 2004. C to Asynchronous Dataflow Circuits: An End-to-End Toolflow. In IEEE 13th International Workshop on Logic Synthesis (IWLS). IEEE, Temecula, CA.
[46]
Vitis HLS Coding Styles. 2021. https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/vitis_hls_coding_styles.html
[47]
Erwei Wang, James J. Davis, Peter Y. K. Cheung, and George A. Constantinides. 2019. LUTNet: Rethinking Inference in FPGA Soft Logic. In 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM) . 26--34. https://doi.org/10.1109/FCCM.2019.00014
[48]
Xilinx Vitis HLS. 2021. https://www.xilinx.com/html_docs/xilinx2020_2/vitis_doc/index.html
[49]
Tanner Young-Schultz, Lothar Lilge, Stephen Brown, and Vaughn Betz. 2020. Using OpenCL to Enable Software-like Development of an FPGA-Accelerated Biophotonic Cancer Treatment Simulator. In Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (Seaside, CA, USA) (FPGA '20). Association for Computing Machinery, New York, NY, USA, 86--96. https://doi.org/10.1145/3373087.3375300
[50]
Z. Zhang and B. Liu. 2013. SDC-based modulo scheduling for pipeline synthesis. In 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). 211--218. https://doi.org/10.1109/ICCAD.2013.6691121
[51]
Ruizhe Zhao, Ho-Cheung Ng, Wayne Luk, and Xinyu Niu. 2018. Towards Efficient Convolutional Neural Network for Domain-Specific Applications on FPGA. In 2018 28th International Conference on Field Programmable Logic and Applications (FPL). 147--1477. https://doi.org/10.1109/FPL.2018.00033

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00062(765-779)Online publication date: 2-Nov-2024
  • (2024)Efficient Design Space Exploration for Dynamic & Speculative High-Level Synthesis2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00024(109-117)Online publication date: 2-Sep-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '22: Proceedings of the 2022 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2022
211 pages
ISBN:9781450391498
DOI:10.1145/3490422
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. dynamic scheduling
  2. high-level synthesis
  3. static analysis

Qualifiers

  • Research-article

Funding Sources

Conference

FPGA '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)3
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Unifying Static and Dynamic Intermediate Languages for Accelerator GeneratorsProceedings of the ACM on Programming Languages10.1145/36897908:OOPSLA2(2242-2267)Online publication date: 8-Oct-2024
  • (2024)Hestia: An Efficient Cross-Level Debugger for High-Level Synthesis2024 57th IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO61859.2024.00062(765-779)Online publication date: 2-Nov-2024
  • (2024)Efficient Design Space Exploration for Dynamic & Speculative High-Level Synthesis2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00024(109-117)Online publication date: 2-Sep-2024
  • (2023)Balancing Static Islands in Dynamically Scheduled Circuits Using Continuous Petri NetsIEEE Transactions on Computers10.1109/TC.2023.329259072:11(3300-3313)Online publication date: 13-Jul-2023
  • (2023)A High-Frequency Load-Store Queue with Speculative Allocations for High-Level Synthesis2023 International Conference on Field Programmable Technology (ICFPT)10.1109/ICFPT59805.2023.00018(115-124)Online publication date: 12-Dec-2023
  • (2023)Compiler Discovered Dynamic Scheduling of Irregular Code in High-Level Synthesis2023 33rd International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL60245.2023.00009(1-9)Online publication date: 4-Sep-2023

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media