Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2840819.2840831acmconferencesArticle/Chapter ViewAbstractPublication PagesiccadConference Proceedingsconference-collections
tutorial

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

Published: 02 November 2015 Publication History

Abstract

Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive loop iterations. However, existing HLS techniques provide inadequate support for pipelining irregular loop nests that contain dynamic-bound inner loops, where unrolling is either very expensive or not even applicable. To overcome this major limitation, we propose ElasticFlow, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner. These LPUs can be either specialized to execute an individual loop or shared amongst multiple inner loops for area reduction. We evaluate ElasticFlow using a variety of real-life applications and demonstrate significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.

References

[1]
M. Alle, A. Morvan, and S. Derrien. Runtime Dependency Analysis for Loop Pipelining in High-Level Synthesis. Design Automation Conf. (DAC), Jun 2013.
[2]
Y. Ben-Asher, D. Meisler, and N. Rotem. Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs. ACM Trans. on Reconfigurable Technology and Systems (TRETS), 3(3), Sep 2010.
[3]
A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski. LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2011.
[4]
J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(4):473--491, Apr 2011.
[5]
T. S. Czajkowski, D. Neto, M. Kinsner, U. Aydonat, J. Wong, D. Denisenko, P. Yiannacouras, J. Freeman, D. P. Singh, and S. D. Brown. OpenCL for FPGAs: Prototyping a Compiler. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA), pages 3--12, Jul 2012.
[6]
S. Dai, M. Tan, K. Hao, and Z. Zhang. Flushing-Enabled Loop Pipelining for High-Level Synthesis. Design Automation Conf. (DAC), Jun 2014.
[7]
B. Fitzpatrick. Distributed Caching with Memcached. Linux journal, 2004(124):5, Aug 2004.
[8]
K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: a Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002.
[9]
O. Kocberber, B. Grot, J. Picorel, B. Falsafi, K. Lim, and P. Ranganathan. Meet the Walkers: Accelerating Index Traversals for In-Memory Databases. Int'l Symp. on Microarchitecture (MICRO), pages 468--479, Dec 2013.
[10]
C. Lattner and V. Adve. LLVM: a Compilation Framework for Lifelong Program Analysis & Transformation. Int'l Symp. on Code Generation and Optimization (CGO), pages 75--86, Mar 2004.
[11]
M. Lattuada and F. Ferrandi. Exploiting Outer Loops Vectorization in High Level Synthesis. Architecture of Computing Systems (ARCS), pages 31--42, Mar 2015.
[12]
P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Deadlock Avoidance for Streaming Computations with Filtering. Int'l Symp. on Parallelism in Algorithms and Architectures (SPAA), Jun 2010.
[13]
F. Liu, S. Ghosh, N. P. Johnson, and D. I. August. CGPA: Coarse-Grained Pipelined Accelerators. Design Automation Conf. (DAC), pages 1--6, Jun 2014.
[14]
D. Petkov, R. Harr, and S. Amarasinghe. Efficient Pipelining of Nested Loops: Unroll-and-Squash. Int'l Parallel and Distributed Processing Symposium (IPDPS), Apr 2001.
[15]
L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2013.
[16]
J. Ramanujam. Optimal Software Pipelining of Nested Loops. Int'l Parallel Processing Symp. (IPPS), pages 335--342, Apr 1994.
[17]
B. R. Rau. Iterative Modulo Scheduling: an Algorithm for Software Pipelining Loops. Int'l Symp. on Microarchitecture (MICRO), pages 63--74, Nov 1994.
[18]
M. Tan, S. Dai, U. Gupta, and Z. Zhang. Mapping-Aware Constrained Scheduling for LUT-Based FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2015.
[19]
M. Tan, B. Liu, S. Dai, and Z. Zhang. Multithreaded Pipeline Synthesis for Data-Parallel Kernels. Int'l Conf. on Computer-Aided Design (ICCAD), pages 718--725, Nov 2014.
[20]
F. Winterstein, S. Bayliss, and G. A. Constantinides. High-Level Synthesis of Dynamic Data Structures: A Case Study Using Vivado HLS. Int'l Conf. on Field Programmable Technology (FPT), pages 362--365, Dec 2013.
[21]
Z. Zhang and B. Liu. SDC-Based Modulo Scheduling for Pipeline Synthesis. Int'l Conf. on Computer-Aided Design (ICCAD), pages 211--218, Nov 2013.
[22]
R. Zhao, M. Tan, S. Dai, and Z. Zhang. Area-Efficient Pipelining for FPGA-Targeted High-Level Synthesis. Design Automation Conf. (DAC), Jun 2015.

Cited By

View all
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
  • (2018)A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT FormulationProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174268(137-146)Online publication date: 15-Feb-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICCAD '15: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design
November 2015
955 pages
ISBN:9781467383899
  • General Chair:
  • Diana Marculescu,
  • Program Chair:
  • Frank Liu

Sponsors

Publisher

IEEE Press

Publication History

Published: 02 November 2015

Check for updates

Qualifiers

  • Tutorial
  • Research
  • Refereed limited

Conference

ICCAD '15
Sponsor:

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)3
  • Downloads (Last 6 weeks)0
Reflects downloads up to 12 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
  • (2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
  • (2018)A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT FormulationProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174268(137-146)Online publication date: 15-Feb-2018
  • (2018)Dynamically Scheduled High-level SynthesisProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174264(127-136)Online publication date: 15-Feb-2018
  • (2018)Polyhedral-Based Dynamic Loop Pipelining for High-Level SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.278336337:9(1802-1815)Online publication date: 1-Sep-2018
  • (2017)An Out-of-Order Load-Store Queue for Spatial ComputingACM Transactions on Embedded Computing Systems10.1145/312652516:5s(1-19)Online publication date: 27-Sep-2017
  • (2017)Exploiting vectorization in high level synthesis of nested irregular loopsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2017.03.00175:C(1-14)Online publication date: 1-Apr-2017
  • (2016)Improving high-level synthesis with decoupled data structure optimizationProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2898030(1-6)Online publication date: 5-Jun-2016

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media