tutorial

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests

Authors:

Zhiru ZhangAuthors Info & Claims

ICCAD '15: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design

Pages 78 - 85

Published: 02 November 2015 Publication History

Abstract

Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive loop iterations. However, existing HLS techniques provide inadequate support for pipelining irregular loop nests that contain dynamic-bound inner loops, where unrolling is either very expensive or not even applicable. To overcome this major limitation, we propose ElasticFlow, a novel architectural synthesis approach capable of dynamically distributing inner loops to an array of loop processing units (LPUs) in a complexity-effective manner. These LPUs can be either specialized to execute an individual loop or shared amongst multiple inner loops for area reduction. We evaluate ElasticFlow using a variety of real-life applications and demonstrate significant performance improvements over a widely used commercial HLS tool for Xilinx FPGAs.

References

[1]

M. Alle, A. Morvan, and S. Derrien. Runtime Dependency Analysis for Loop Pipelining in High-Level Synthesis. Design Automation Conf. (DAC), Jun 2013.

Digital Library

[2]

Y. Ben-Asher, D. Meisler, and N. Rotem. Reducing Memory Constraints in Modulo Scheduling Synthesis for FPGAs. ACM Trans. on Reconfigurable Technology and Systems (TRETS), 3(3), Sep 2010.

Digital Library

[3]

A. Canis, J. Choi, M. Aldham, V. Zhang, A. Kammoona, J. H. Anderson, S. Brown, and T. Czajkowski. LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2011.

Digital Library

[4]

J. Cong, B. Liu, S. Neuendorffer, J. Noguera, K. Vissers, and Z. Zhang. High-Level Synthesis for FPGAs: From Prototyping to Deployment. IEEE Trans. on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 30(4):473--491, Apr 2011.

Digital Library

[5]

T. S. Czajkowski, D. Neto, M. Kinsner, U. Aydonat, J. Wong, D. Denisenko, P. Yiannacouras, J. Freeman, D. P. Singh, and S. D. Brown. OpenCL for FPGAs: Prototyping a Compiler. Int'l Conf. on Engineering of Reconfigurable Systems and Algorithms (ERSA), pages 3--12, Jul 2012.

[6]

S. Dai, M. Tan, K. Hao, and Z. Zhang. Flushing-Enabled Loop Pipelining for High-Level Synthesis. Design Automation Conf. (DAC), Jun 2014.

Digital Library

[7]

B. Fitzpatrick. Distributed Caching with Memcached. Linux journal, 2004(124):5, Aug 2004.

Digital Library

[8]

K. Kennedy and J. R. Allen. Optimizing Compilers for Modern Architectures: a Dependence-Based Approach. Morgan Kaufmann Publishers Inc., 2002.

Digital Library

[9]

O. Kocberber, B. Grot, J. Picorel, B. Falsafi, K. Lim, and P. Ranganathan. Meet the Walkers: Accelerating Index Traversals for In-Memory Databases. Int'l Symp. on Microarchitecture (MICRO), pages 468--479, Dec 2013.

Digital Library

[10]

C. Lattner and V. Adve. LLVM: a Compilation Framework for Lifelong Program Analysis & Transformation. Int'l Symp. on Code Generation and Optimization (CGO), pages 75--86, Mar 2004.

Digital Library

[11]

M. Lattuada and F. Ferrandi. Exploiting Outer Loops Vectorization in High Level Synthesis. Architecture of Computing Systems (ARCS), pages 31--42, Mar 2015.

[12]

P. Li, K. Agrawal, J. Buhler, and R. D. Chamberlain. Deadlock Avoidance for Streaming Computations with Filtering. Int'l Symp. on Parallelism in Algorithms and Architectures (SPAA), Jun 2010.

Digital Library

[13]

F. Liu, S. Ghosh, N. P. Johnson, and D. I. August. CGPA: Coarse-Grained Pipelined Accelerators. Design Automation Conf. (DAC), pages 1--6, Jun 2014.

Digital Library

[14]

D. Petkov, R. Harr, and S. Amarasinghe. Efficient Pipelining of Nested Loops: Unroll-and-Squash. Int'l Parallel and Distributed Processing Symposium (IPDPS), Apr 2001.

Digital Library

[15]

L.-N. Pouchet, P. Zhang, P. Sadayappan, and J. Cong. Polyhedral-Based Data Reuse Optimization for Configurable Computing. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2013.

Digital Library

[16]

J. Ramanujam. Optimal Software Pipelining of Nested Loops. Int'l Parallel Processing Symp. (IPPS), pages 335--342, Apr 1994.

Digital Library

[17]

B. R. Rau. Iterative Modulo Scheduling: an Algorithm for Software Pipelining Loops. Int'l Symp. on Microarchitecture (MICRO), pages 63--74, Nov 1994.

Digital Library

[18]

M. Tan, S. Dai, U. Gupta, and Z. Zhang. Mapping-Aware Constrained Scheduling for LUT-Based FPGAs. Int'l Symp. on Field-Programmable Gate Arrays (FPGA), Feb 2015.

Digital Library

[19]

M. Tan, B. Liu, S. Dai, and Z. Zhang. Multithreaded Pipeline Synthesis for Data-Parallel Kernels. Int'l Conf. on Computer-Aided Design (ICCAD), pages 718--725, Nov 2014.

Digital Library

[20]

F. Winterstein, S. Bayliss, and G. A. Constantinides. High-Level Synthesis of Dynamic Data Structures: A Case Study Using Vivado HLS. Int'l Conf. on Field Programmable Technology (FPT), pages 362--365, Dec 2013.

[21]

Z. Zhang and B. Liu. SDC-Based Modulo Scheduling for Pipeline Synthesis. Int'l Conf. on Computer-Aided Design (ICCAD), pages 211--218, Nov 2013.

Digital Library

[22]

R. Zhao, M. Tan, S. Dai, and Z. Zhang. Area-Efficient Pipelining for FPGA-Targeted High-Level Synthesis. Design Automation Conf. (DAC), Jun 2015.

Digital Library

Cited By

Lai YUstun EXiang SFang ZRong HZhang Z(2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3469660
Vilim MRucker AOlukotun KMartínez JDuato JJohn L(2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00039
Dai SLiu GZhang ZAnderson JBazargan K(2018)A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT FormulationProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174268(137-146)Online publication date: 15-Feb-2018
https://dl.acm.org/doi/10.1145/3174243.3174268
Show More Cited By

Index Terms

ElasticFlow: A Complexity-Effective Approach for Pipelining Irregular Loop Nests
1. Applied computing
  1. Arts and humanities
    1. Architecture (buildings)
      1. Computer-aided design
  2. Physical sciences and engineering
    1. Engineering
      1. Computer-aided design
2. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
  2. Hardware validation

Recommendations

ElasticFlow: A complexity-effective approach for pipelining irregular loop nests
2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD)
Modern high-level synthesis (HLS) tools commonly employ pipelining to achieve efficient loop acceleration by overlapping the execution of successive loop iterations. However, existing HLS techniques provide inadequate support for pipelining irregular loop ...
Transformations techniques for extracting parallelism in non-uniform nested loops

Executing a program in parallel machines needs not only to find sufficient parallelism in a program, but it is also important that we minimize the synchronization and communication overheads in the parallelized program. This yields to improve the ...
Timing optimization via nest-loop pipelining considering code size

Embedded systems have strict timing and code size requirements. Software pipelining is one of the most important optimization techniques to improve the execution time of loops by increasing the parallelism among successive loop iterations. However, ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ICCAD '15: Proceedings of the IEEE/ACM International Conference on Computer-Aided Design

November 2015

955 pages

ISBN:9781467383899

General Chair:
Diana Marculescu
Carnegie Mellon Univ., Dept. of Electrical and Computer Engineering, 5000 Forbes Ave., Pittsburgh, PA 15213, 412-268-1167
,
Program Chair:
Frank Liu
IBM Corp., IBM Austin Research Lab, 11501 Burnet Rd., MS 904-6G017, Austin, TX 78758, 512-286-7267

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

IEEE Press

Publication History

Published: 02 November 2015

Check for updates

Qualifiers

Tutorial
Research
Refereed limited

Conference

ICCAD '15

Sponsor:

SIGDA

ICCAD '15: IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER-AIDED DESIGN

November 2 - 6, 2015

TX, Austin, USA

Acceptance Rates

Overall Acceptance Rate 457 of 1,762 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
172
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 12 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Lai YUstun EXiang SFang ZRong HZhang Z(2021)Programming and Synthesis for Software-defined FPGA Acceleration: Status and Future ProspectsACM Transactions on Reconfigurable Technology and Systems10.1145/346966014:4(1-39)Online publication date: 13-Sep-2021
https://dl.acm.org/doi/10.1145/3469660
Vilim MRucker AOlukotun KMartínez JDuato JJohn L(2021)AurochsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00039(402-415)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00039
Dai SLiu GZhang ZAnderson JBazargan K(2018)A Scalable Approach to Exact Resource-Constrained Scheduling Based on a Joint SDC and SAT FormulationProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174268(137-146)Online publication date: 15-Feb-2018
https://dl.acm.org/doi/10.1145/3174243.3174268
Josipović LGhosal RIenne PAnderson JBazargan K(2018)Dynamically Scheduled High-level SynthesisProceedings of the 2018 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays10.1145/3174243.3174264(127-136)Online publication date: 15-Feb-2018
https://dl.acm.org/doi/10.1145/3174243.3174264
Liu JWickerson JBayliss SConstantinides G(2018)Polyhedral-Based Dynamic Loop Pipelining for High-Level SynthesisIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2017.278336337:9(1802-1815)Online publication date: 1-Sep-2018
https://dl.acm.org/doi/10.1109/TCAD.2017.2783363
Josipovic LBrisk PIenne P(2017)An Out-of-Order Load-Store Queue for Spatial ComputingACM Transactions on Embedded Computing Systems10.1145/312652516:5s(1-19)Online publication date: 27-Sep-2017
https://dl.acm.org/doi/10.1145/3126525
(2017)Exploiting vectorization in high level synthesis of nested irregular loopsJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2017.03.00175:C(1-14)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1016/j.sysarc.2017.03.001
Zhao RLiu GSrinath SBatten CZhang Z(2016)Improving high-level synthesis with decoupled data structure optimizationProceedings of the 53rd Annual Design Automation Conference10.1145/2897937.2898030(1-6)Online publication date: 5-Jun-2016
https://dl.acm.org/doi/10.1145/2897937.2898030

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten