research-article

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Authors:

James C. HoeAuthors Info & Claims

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

Pages 221 - 230

https://doi.org/10.1145/2435264.2435302

Published: 11 February 2013 Publication History

Abstract

This paper presents initial work on developing a C compiler for the CoRAM FPGA computing abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that operate on large data sets in external DRAM. As required by the CoRAM abstraction, the compiler partitions source code into two separate implementation components: (1) hardware kernel pipelines to be mapped onto the reconfigurable logic fabric; and (2) control threads that express, in a C-like language, the sequencing and coordination of data transfers between the hardware kernels and external DRAM. The compiler performs optimizations to increase parallelism and use DRAM bandwidth efficiently. It can target different FPGA platforms that support the CoRAM abstraction, either natively in a future FPGA or in soft-logic on today's devices. The CoRAM abstraction provides a convenient high-level compilation target to simplify the task of design optimization and system generation. The compiler is evaluated using three test programs (matrix-matrix multiplication, k-nearest neighbor, and 2D convolution) on the Xilinx ML605 and the Altera DE4. Results show that our compiler is able to target the different platforms and effectively exploit their dissimilar capacities and features. Depending on the application, the compiler-generated implementations achieve performance ranging from a factor of 4 slower to a factor of 2 faster relative to hand-designed implementations, as measured on actual hardware.

References

[1]

Michael Adler, Kermin E. Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. Leap Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays}, FPGA '11, pages 25--28, New York, NY, USA, 2011. ACM.

Digital Library

[2]

Christophe Alias, Bogdan Pasca, and Alexandru Plesco. FPGA-Specific Synthesis of Loop-Nests With Pipelined Computational Cores. Microprocessors and Microsystems, June 2012.

Digital Library

[3]

J.D. Bakos. High-Performance Heterogeneous Computing with the Convey HC-1. Computing in Science Engineering, 12(6):80 --87, nov.-dec. 2010.

Digital Library

[4]

Wenqi Bao, Jiang Jiang, Yuzhuo Fu, and Qing Sun. A Reconfigurable Macro-Pipelined Systolic Accelerator Architecture. In FPT, pages 1--6, 2011.

[5]

Cedric Bastoul. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 7--16, Washington, DC, USA, 2004. IEEE Computer Society.

Digital Library

[6]

Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for Custom FPGA Loop Accelerators. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM.

Digital Library

[7]

Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 101--113, New York, NY, USA, 2008. ACM.

Digital Library

[8]

Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '11, pages 33--36, New York, NY, USA, 2011. ACM.

Digital Library

[9]

Shuai Che, Jie Li, Jeremy W. Sheaffer, Kevin Skadron, and John Lach. Accelerating Compute-Intensive Applications with GPUs and FPGAs. In Proceedings of the 2008 Symposium on Application Specific Processors, SASP '08, pages 101--107, Washington, DC, USA, 2008. IEEE Computer Society.

Digital Library

[10]

Eric S. Chung, James C. Hoe, and Ken Mai. CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '11, pages 97--106, New York, NY, USA, 2011. ACM.

Digital Library

[11]

Eric S. Chung, Peter A. Milder, James C. Hoe, and Ken Mai. Single-Chip Heterogeneous Computing : Does the Future Include Custom Logic, FPGAs, and GPGPUs? International Symposium on Microarchitecture (MICRO-43), Atlanta, GA, 2010, pages 225--236, 2010.

Digital Library

[12]

Eric S. Chung, Michael K. Papamichael, Gabriel Weisz, James C. Hoe, and Ken Mai. Prototype and evaluation of the CoRAM Memory Architecture for FPGA-Based Computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 139--142, New York, NY, USA, 2012. ACM.

Digital Library

[13]

Jason Cong, Peng Zhang, and Yi Zou. Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization. In Proceedings of the International Conference on Computer-Aided Design, ICCAD '11, pages 185--192, Piscataway, NJ, USA, 2011. IEEE Press.

Digital Library

[14]

Pedro C. Diniz and Joonseok Park. Data Reorganization Engines for the Next Generation of System-on-a-Chip FPGAs. In Proceedings of the 2002 ACM/SIGDA Tenth International Symposium on Field-Programmable Gate Arrays, FPGA '02, pages 237--244, New York, NY, USA, 2002. ACM.

Digital Library

[15]

Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 47--56, New York, NY, USA, 2012. ACM.

Digital Library

[16]

Tobias Grosser, Hongbin Zheng, Raghesh A, Andreas Simburger, Armin Grosslinger, and Louis-Noel Pouchet. Polly - Polyhedral Optimization in LLVM. In First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), Chamonix, France, April 2011.

[17]

Robert Kirchgessner, Greg Stitt, Alan George, and Herman Lam. VirtualRC: A Virtual FPGA Platform for Applications and Tools Portability. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 205--208, New York, NY, USA, 2012. ACM.

Digital Library

[18]

Christian Lengauer. Loop parallelization in the polytope model. In Proceedings of the 4th International Conference on Concurrency Theory, CONCUR '93, pages 398--416, London, UK, UK, 1993. Springer-Verlag.

Digital Library

[19]

Michael K. Papamichael and James C. Hoe. CONNECT: Re-examining Conventional Wisdom for Designing NOCS in the Context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 37--46, New York, NY, USA, 2012. ACM.

Digital Library

[20]

David Barrie Thomas, Lee Howes, and Wayne Luk. A Comparison of CPUs, GPUs, FPGAs, and Massively Parallel Processor Arrays for Random Number Generation. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '09, pages 63--72, New York, NY, USA, 2009. ACM.

Digital Library

[21]

Jingling Xue. On Loop Restructuring by Converting Imperfect to Perfect Loop Nests. In IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, 1996. ICAPP '96., pages 421 --429, jun 1996.

[22]

www.ece.cmu.edu/coram.

[23]

www.jacquardcomputing.com/roccc/.

[24]

http:/www.llvm.org/.

[25]

www.impulseaccelerated.com/products.htm.

[26]

www.mentor.com/products/fpga/handel-c/.

[27]

www.mentor.com/esl/catapult/overview.

[28]

www.xilinx.com/products/design-tools/vivado/index.htm.

[29]

www.altera.com/devices/processor/nios2/tools/c2h/ni2-c2h.html.

[30]

www.xilinx.com/support/documentation/data_sheets/ds150.pdf.

[31]

www.altera.com/literature/hb/stratix-iv/stratix4_handbook.pdf.

[32]

www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf.

Cited By

Usui MTakamaeda-Yamazaki S(2023)High-Level Synthesis of Memory Systems for Decoupled Data OrchestrationApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_1(3-18)Online publication date: 16-Sep-2023
https://doi.org/10.1007/978-3-031-42921-7_1
Zhang PHuang MXiao BHuang HCong J(2015)CMOSTProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744807(1-6)Online publication date: 7-Jun-2015
https://dl.acm.org/doi/10.1145/2744769.2744807
Hussain TSonmez NPalomar OUnsal OCristal AAyguade EValero MGursal S(2014)PAMS: Pattern Aware Memory System for embedded systems2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032544(1-7)Online publication date: Dec-2014
https://doi.org/10.1109/ReConFig.2014.7032544
Show More Cited By

Index Terms

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction
1. Hardware
  1. Electronic design automation
    1. High-level and register-transfer level synthesis
      1. Datapath optimization
    2. Logic synthesis
      1. Circuit optimization

Recommendations

Prototype and evaluation of the CoRAM memory architecture for FPGA-based computing
FPGA '12: Proceedings of the ACM/SIGDA international symposium on Field Programmable Gate Arrays

The CoRAM memory architecture for FPGA-based computing augments traditional reconfigurable fabric with a natural and effective way for applications to interact with off-chip memory and I/O. The two central tenets of the CoRAM memory architecture are (1) ...
Cross-platform FPGA accelerator development using CoRAM and CONNECT
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

The CoRAM memory architecture is an easy-to-use and portable abstraction for FPGA accelerator development [1, 2]. Using the CoRAM framework, FPGA developers can write their applications once and re-target them automatically to different FPGA platforms ...
Efficient hardware code generation for FPGAs

The wider acceptance of FPGAs as a computing device requires a higher level of programming abstraction. ROCCC is an optimizing C to HDL compiler. We describe the code generation approach in ROCCC. The smart buffer is a component that reuses input data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays

February 2013

294 pages

ISBN:9781450318877

DOI:10.1145/2435264

General Chair:
Brad Hutchings
Brigham Young University, USA
,
Program Chair:
Vaughn Betz
University of Toronto, Canada

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

FPGA '13

Sponsor:

SIGDA

FPGA '13: The 2013 ACM/SIGDA International Symposium on Field Programmable Gate Arrays

February 11 - 13, 2013

California, Monterey, USA

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
218
Total Downloads

Downloads (Last 12 months)10
Downloads (Last 6 weeks)0

Reflects downloads up to 22 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Usui MTakamaeda-Yamazaki S(2023)High-Level Synthesis of Memory Systems for Decoupled Data OrchestrationApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_1(3-18)Online publication date: 16-Sep-2023
https://doi.org/10.1007/978-3-031-42921-7_1
Zhang PHuang MXiao BHuang HCong J(2015)CMOSTProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744807(1-6)Online publication date: 7-Jun-2015
https://dl.acm.org/doi/10.1145/2744769.2744807
Hussain TSonmez NPalomar OUnsal OCristal AAyguade EValero MGursal S(2014)PAMS: Pattern Aware Memory System for embedded systems2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032544(1-7)Online publication date: Dec-2014
https://doi.org/10.1109/ReConFig.2014.7032544
Cheng SWawrzynek J(2014)Architectural synthesis of computational pipelines with decoupled memory access2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082758(83-90)Online publication date: Dec-2014
https://doi.org/10.1109/FPT.2014.7082758

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents