research-article

Open access

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Authors:

Lana Josipović,

Shabnam Sheikhha,

Andrea Guerrieri,

Jordi CortadellaAuthors Info & Claims

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

Pages 186 - 196

https://doi.org/10.1145/3373087.3375314

Published: 24 February 2020 Publication History

Abstract

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit's timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

References

[1]

A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, first edition, 1986.

Digital Library

[2]

M. Budiu, P. V. Artigas, and S. C. Goldstein. Dataflow: A complement to superscalar. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 177--86, Austin, Tex., Mar. 2005.

Digital Library

[3]

D. Bufistov, J. Cortadella, M. Kishinevsky, and S. Sapatnekar. A general model for performance optimization of sequential systems. In Proceedings of the International Conference on Computer-Aided Design, pages 362--369, San Jose, Calif., Nov. 2007.

[4]

J. Campos, G. Chiola, J. M. Colom, and M. Silva. Properties and performance bounds for timed marked graphs. "IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications", 39(5):386--401, May 1992.

[5]

A. Canis, S. D. Brown, and J. H. Anderson. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 23rd International Conference on Field-Programmable Logic and Applications, pages 1--8, Munich, Sept. 2014.

[6]

L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli. Theory of latency-insensitive design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-20(9):1059--76, Sept. 2001.

Digital Library

[7]

S. Chatterjee, M. Kishinevsky, and U. Y. Ogras. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Design & Test of Computers, 29(3):80--88, June 2012.

[8]

J. Cortadella, M. Kishinevsky, and B. Grundmann. Synthesis of synchronous elastic architectures. In Proceedings of the 43rd Design Automation Conference, pages 657--62, San Francisco, Calif., July 2006.

Digital Library

[9]

D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer Journal, 45(1):12--18, Jan. 2002.

[10]

J. Forrest, T. Ralphs, S. Vigerske, LouHafer, B. Kristjansson, jpfasano, EdwinStraver, M. Lubin, H. G. Santos, rlougee, and M. Saltzman. coin-or/cbc: Version 2.9.9, July 2018.

[11]

M. R. Greenstreet and K. Steiglitz. Bubbles can make self-timed pipelines fast. Journal of VLSI Signal Processing, 2(3):139--148, Nov. 1990.

[12]

G. Hoover and F. Brewer. Synthesizing synchronous elastic flow networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 306--11, Munich, Mar. 2008.

Digital Library

[13]

H. M. Jacobson, P. N. Kudva, P. Bose, P. W. Cook, S. E. Schuster, E. G. Mercer, and C. J. Myers. Synchronous interlocked pipelines. In Proceedings of the 8th International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 3--12, Manchester, Apr. 2002.

Digital Library

[14]

D. B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing, 4(1):77--84, Mar. 1975.

Digital Library

[15]

L. Josipoviç, R. Ghosal, and P. Ienne. Dynamically scheduled high-level synthesis. In Proceedings of the 26th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 127--36, Monterey, Calif., Feb. 2018.

Digital Library

[16]

L. Josipoviç, A. Guerrieri, and P. Ienne. Speculative dataflow circuits. In Proceedings of the 27th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 162--71, Seaside, Calif., Feb. 2019.

Digital Library

[17]

R. Kastner, J. Matai, and S. Neuendorffer. Parallel programming for FPGAs. ArXiv e-prints, arXiv:1805.03648, May 2018.

[18]

C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1--6):5--35, June 1991.

[19]

R. Manohar and A. J. Martin. Slack elasticity in concurrent computing. In Proc. 4th International Conference on the Mathematics of Program Construction, pages 272--285, London, June 1998.

[20]

T. Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541--80, Apr. 1989.

[21]

M. Najibi and P. A. Beerel. Slack matching mode-based asynchronous circuits for average-case performance. In Proceedings of the 32nd International Conference on Computer-Aided Design, pages 219--225, San Jose, CA, Nov. 2013.

Digital Library

[22]

L.-N. Pouchet. Polybench: The polyhedral benchmark suite, 2012.

[23]

W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, third edition, 2007.

Digital Library

[24]

C. V. Ramamoorthy and G. S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Trans. Software Eng., 6(5):440--449, Sept. 1980.

Digital Library

[25]

C. Ramchandani. Analysis of asynchronous concurrent systems by timed Petri nets. Technical Report Project MAC Tech. Rep. 120, Massachusetts Inst. of Tech., Feb. 1974.

[26]

B. R. Rau. Iterative modulo scheduling. International Journal of Parallel Programming, 24(1):3--64, Feb. 1996.

Digital Library

[27]

J. Sparsø. Current trends in high-level synthesis of asynchronous circuits. In Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems, pages 347--50, Yasmine Hammamet, Dec. 2009.

[28]

G. Venkataramani and S. C. Goldstein. Leveraging protocol knowledge in slack matching. In Proceedings of the 25th International Conference on Computer-Aided Design, pages 724--729, San Jose, CA, Nov. 2006.

[29]

Z. Zhang and B. Liu. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the 32nd International Conference on Computer-Aided Design, pages 211--218, San Jose, CA, Nov. 2013.

Digital Library

Cited By

Xu JJosipovic LEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)CRUSH: A Credit-Based Approach for Functional Unit Sharing in Dynamically Scheduled HLSProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707273(249-263)Online publication date: 3-Feb-2025
Khatti MTian XSedigh Baroughi ARaj Baranwal AChi YGuo LCong JFang Z(2024)PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/367684917:3(1-31)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3676849
Xu JJosipović LZhang ZPutnam A(2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637570
Show More Cited By

Index Terms

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Recommendations

Buffer Placement and Sizing for High-Performance Dataflow Circuits
Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches), ...
Resource Sharing in Dataflow Circuits
FPGA '21: The 2021 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

To achieve resource-efficient hardware designs, high-level synthesis tools share functional units among operations of the same type. This optimization is typically performed in conjunction with operation scheduling to ensure the best possible unit usage ...
Resource Sharing in Dataflow Circuits
To achieve resource-efficient hardware designs, high-level synthesis (HLS) tools share (i.e., time-multiplex) functional units among operations of the same type. This optimization is typically performed in conjunction with operation scheduling to ensure ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 2020

346 pages

ISBN:9781450370998

DOI:10.1145/3373087

General Chair:
Stephen Neuendorffer
Xilinx, USA
,
Program Chair:
Lesley Shannon
Simon Fraser University, Canada

Copyright © 2020 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2020

Check for updates

Badges

Best Paper

Author Tags

Qualifiers

Research-article

Funding Sources

MINECO TIN
Google PhD Fellowship
GENCAT

Conference

FPGA '20

Sponsor:

SIGDA

FPGA '20: The 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

February 23 - 25, 2020

CA, Seaside, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Sponsor:
sigda

The 2025 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 27 - March 1, 2025

Monterey , CA , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

30
Total Citations
View Citations
1,273
Total Downloads

Downloads (Last 12 months)163
Downloads (Last 6 weeks)18

Reflects downloads up to 22 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Xu JJosipovic LEeckhout LSmaragdakis GLiang KSampson AKim MRossbach C(2025)CRUSH: A Credit-Based Approach for Functional Unit Sharing in Dynamically Scheduled HLSProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707273(249-263)Online publication date: 3-Feb-2025
Khatti MTian XSedigh Baroughi ARaj Baranwal AChi YGuo LCong JFang Z(2024)PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/367684917:3(1-31)Online publication date: 5-Aug-2024
https://dl.acm.org/doi/10.1145/3676849
Xu JJosipović LZhang ZPutnam A(2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637570
Elakhras AGuerrieri AJosipovic LIenne PZhang ZPutnam A(2024)Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow CircuitsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637556(44-54)Online publication date: 1-Apr-2024
https://dl.acm.org/doi/10.1145/3626202.3637556
Honorat ADardaillon MMiomandre HNezan J(2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
https://dl.acm.org/doi/10.1145/3626103
Liu JGraczyk MGuerrieri AJosipović L(2024)Fast Switching Activity Estimation for HLS-Produced Dataflow Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00025(118-125)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00025
Guerrieri AGuha SLavin CHung EJosipović LIenne P(2024)DynaRapid: Fast-Tracking from C to Routed Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00014(24-32)Online publication date: 2-Sep-2024
https://doi.org/10.1109/FPL64840.2024.00014
Tu KTang XYu CJosipović LChu ZTu KTang XYu CJosipović LChu Z(2024)High-Level SynthesisFPGA EDA10.1007/978-981-99-7755-0_8(113-134)Online publication date: 1-Feb-2024
https://doi.org/10.1007/978-981-99-7755-0_8
Guo LChi YLau JSong LTian XKhatti MQiao WWang JUstun EFang ZZhang ZCong J(2023)TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical DesignACM Transactions on Reconfigurable Technology and Systems10.1145/360933516:4(1-31)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3609335
Cheng JFraca EWickerson JConstantinides G(2023)Balancing Static Islands in Dynamically Scheduled Circuits Using Continuous Petri NetsIEEE Transactions on Computers10.1109/TC.2023.329259072:11(3300-3313)Online publication date: 1-Nov-2023
https://dl.acm.org/doi/10.1109/TC.2023.3292590
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten