Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373087.3375314acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article
Open access

Buffer Placement and Sizing for High-Performance Dataflow Circuits

Published: 24 February 2020 Publication History

Abstract

Commercial high-level synthesis tools typically produce statically scheduled circuits. Yet, effective C-to-circuit conversion of arbitrary software applications calls for dataflow circuits, as they can handle efficiently variable latencies (e.g., caches) and unpredictable memory dependencies. Dataflow circuits exhibit an unconventional property: registers (usually referred to as "buffers") can be placed anywhere in the circuit without changing its semantics, in strong contrast to what happens in traditional datapaths. Yet, although functionally irrelevant, this placement has a significant impact on the circuit's timing and throughput. In this work, we show how to strategically place buffers into a dataflow circuit to optimize its performance. Our approach extracts a set of choice-free critical loops from arbitrary dataflow circuits and relies on the theory of marked graphs to optimize the buffer placement and sizing. We demonstrate the performance benefits of our approach on a set of dataflow circuits obtained from imperative code.

References

[1]
A. V. Aho, R. Sethi, and J. D. Ullman. Compilers: Principles, Techniques, and Tools. Addison-Wesley Longman Publishing Co., Inc., Boston, MA, USA, first edition, 1986.
[2]
M. Budiu, P. V. Artigas, and S. C. Goldstein. Dataflow: A complement to superscalar. In Proceedings of the IEEE International Symposium on Performance Analysis of Systems and Software, pages 177--86, Austin, Tex., Mar. 2005.
[3]
D. Bufistov, J. Cortadella, M. Kishinevsky, and S. Sapatnekar. A general model for performance optimization of sequential systems. In Proceedings of the International Conference on Computer-Aided Design, pages 362--369, San Jose, Calif., Nov. 2007.
[4]
J. Campos, G. Chiola, J. M. Colom, and M. Silva. Properties and performance bounds for timed marked graphs. "IEEE Transactions on Circuits and Systems I: Fundamental Theory and Applications", 39(5):386--401, May 1992.
[5]
A. Canis, S. D. Brown, and J. H. Anderson. Modulo SDC scheduling with recurrence minimization in high-level synthesis. In Proceedings of the 23rd International Conference on Field-Programmable Logic and Applications, pages 1--8, Munich, Sept. 2014.
[6]
L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli. Theory of latency-insensitive design. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, CAD-20(9):1059--76, Sept. 2001.
[7]
S. Chatterjee, M. Kishinevsky, and U. Y. Ogras. xMAS: Quick formal modeling of communication fabrics to enable verification. IEEE Design & Test of Computers, 29(3):80--88, June 2012.
[8]
J. Cortadella, M. Kishinevsky, and B. Grundmann. Synthesis of synchronous elastic architectures. In Proceedings of the 43rd Design Automation Conference, pages 657--62, San Francisco, Calif., July 2006.
[9]
D. Edwards and A. Bardsley. Balsa: An asynchronous hardware synthesis language. The Computer Journal, 45(1):12--18, Jan. 2002.
[10]
J. Forrest, T. Ralphs, S. Vigerske, LouHafer, B. Kristjansson, jpfasano, EdwinStraver, M. Lubin, H. G. Santos, rlougee, and M. Saltzman. coin-or/cbc: Version 2.9.9, July 2018.
[11]
M. R. Greenstreet and K. Steiglitz. Bubbles can make self-timed pipelines fast. Journal of VLSI Signal Processing, 2(3):139--148, Nov. 1990.
[12]
G. Hoover and F. Brewer. Synthesizing synchronous elastic flow networks. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition, pages 306--11, Munich, Mar. 2008.
[13]
H. M. Jacobson, P. N. Kudva, P. Bose, P. W. Cook, S. E. Schuster, E. G. Mercer, and C. J. Myers. Synchronous interlocked pipelines. In Proceedings of the 8th International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 3--12, Manchester, Apr. 2002.
[14]
D. B. Johnson. Finding all the elementary circuits of a directed graph. SIAM Journal on Computing, 4(1):77--84, Mar. 1975.
[15]
L. Josipoviç, R. Ghosal, and P. Ienne. Dynamically scheduled high-level synthesis. In Proceedings of the 26th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 127--36, Monterey, Calif., Feb. 2018.
[16]
L. Josipoviç, A. Guerrieri, and P. Ienne. Speculative dataflow circuits. In Proceedings of the 27th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, pages 162--71, Seaside, Calif., Feb. 2019.
[17]
R. Kastner, J. Matai, and S. Neuendorffer. Parallel programming for FPGAs. ArXiv e-prints, arXiv:1805.03648, May 2018.
[18]
C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry. Algorithmica, 6(1--6):5--35, June 1991.
[19]
R. Manohar and A. J. Martin. Slack elasticity in concurrent computing. In Proc. 4th International Conference on the Mathematics of Program Construction, pages 272--285, London, June 1998.
[20]
T. Murata. Petri nets: Properties, analysis and applications. Proceedings of the IEEE, 77(4):541--80, Apr. 1989.
[21]
M. Najibi and P. A. Beerel. Slack matching mode-based asynchronous circuits for average-case performance. In Proceedings of the 32nd International Conference on Computer-Aided Design, pages 219--225, San Jose, CA, Nov. 2013.
[22]
L.-N. Pouchet. Polybench: The polyhedral benchmark suite, 2012.
[23]
W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery. Numerical Recipes: The Art of Scientific Computing. Cambridge University Press, Cambridge, third edition, 2007.
[24]
C. V. Ramamoorthy and G. S. Ho. Performance evaluation of asynchronous concurrent systems using Petri nets. IEEE Trans. Software Eng., 6(5):440--449, Sept. 1980.
[25]
C. Ramchandani. Analysis of asynchronous concurrent systems by timed Petri nets. Technical Report Project MAC Tech. Rep. 120, Massachusetts Inst. of Tech., Feb. 1974.
[26]
B. R. Rau. Iterative modulo scheduling. International Journal of Parallel Programming, 24(1):3--64, Feb. 1996.
[27]
J. Sparsø. Current trends in high-level synthesis of asynchronous circuits. In Proceedings of the 16th IEEE International Conference on Electronics, Circuits, and Systems, pages 347--50, Yasmine Hammamet, Dec. 2009.
[28]
G. Venkataramani and S. C. Goldstein. Leveraging protocol knowledge in slack matching. In Proceedings of the 25th International Conference on Computer-Aided Design, pages 724--729, San Jose, CA, Nov. 2006.
[29]
Z. Zhang and B. Liu. SDC-based modulo scheduling for pipeline synthesis. In Proceedings of the 32nd International Conference on Computer-Aided Design, pages 211--218, San Jose, CA, Nov. 2013.

Cited By

View all
  • (2025)CRUSH: A Credit-Based Approach for Functional Unit Sharing in Dynamically Scheduled HLSProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707273(249-263)Online publication date: 3-Feb-2025
  • (2024)PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/367684917:3(1-31)Online publication date: 5-Aug-2024
  • (2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '20: Proceedings of the 2020 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays
February 2020
346 pages
ISBN:9781450370998
DOI:10.1145/3373087
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 February 2020

Check for updates

Badges

  • Best Paper

Author Tags

  1. buffers
  2. dataflow circuits
  3. high-level synthesis
  4. timing optimization

Qualifiers

  • Research-article

Funding Sources

  • MINECO TIN
  • Google PhD Fellowship
  • GENCAT

Conference

FPGA '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Upcoming Conference

FPGA '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)163
  • Downloads (Last 6 weeks)18
Reflects downloads up to 22 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)CRUSH: A Credit-Based Approach for Functional Unit Sharing in Dynamically Scheduled HLSProceedings of the 30th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 110.1145/3669940.3707273(249-263)Online publication date: 3-Feb-2025
  • (2024)PASTA: Programming and Automation Support for Scalable Task-Parallel HLS Programs on Modern Multi-Die FPGAsACM Transactions on Reconfigurable Technology and Systems10.1145/367684917:3(1-31)Online publication date: 5-Aug-2024
  • (2024)Suppressing Spurious Dynamism of Dataflow Circuits via Latency and Occupancy BalancingProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637570(188-198)Online publication date: 1-Apr-2024
  • (2024)Survival of the Fastest: Enabling More Out-of-Order Execution in Dataflow CircuitsProceedings of the 2024 ACM/SIGDA International Symposium on Field Programmable Gate Arrays10.1145/3626202.3637556(44-54)Online publication date: 1-Apr-2024
  • (2024)Automated Buffer Sizing of Dataflow Applications in a High-level Synthesis WorkflowACM Transactions on Reconfigurable Technology and Systems10.1145/362610317:1(1-26)Online publication date: 27-Jan-2024
  • (2024)Fast Switching Activity Estimation for HLS-Produced Dataflow Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00025(118-125)Online publication date: 2-Sep-2024
  • (2024)DynaRapid: Fast-Tracking from C to Routed Circuits2024 34th International Conference on Field-Programmable Logic and Applications (FPL)10.1109/FPL64840.2024.00014(24-32)Online publication date: 2-Sep-2024
  • (2024)High-Level SynthesisFPGA EDA10.1007/978-981-99-7755-0_8(113-134)Online publication date: 1-Feb-2024
  • (2023)TAPA: A Scalable Task-parallel Dataflow Programming Framework for Modern FPGAs with Co-optimization of HLS and Physical DesignACM Transactions on Reconfigurable Technology and Systems10.1145/360933516:4(1-31)Online publication date: 18-Sep-2023
  • (2023)Balancing Static Islands in Dynamically Scheduled Circuits Using Continuous Petri NetsIEEE Transactions on Computers10.1109/TC.2023.329259072:11(3300-3313)Online publication date: 1-Nov-2023
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media