Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2435264.2435302acmconferencesArticle/Chapter ViewAbstractPublication PagesfpgaConference Proceedingsconference-collections
research-article

C-to-CoRAM: compiling perfect loop nests to the portable CoRAM abstraction

Published: 11 February 2013 Publication History

Abstract

This paper presents initial work on developing a C compiler for the CoRAM FPGA computing abstraction. The presented effort focuses on compiling fixed-bound perfect loop nests that operate on large data sets in external DRAM. As required by the CoRAM abstraction, the compiler partitions source code into two separate implementation components: (1) hardware kernel pipelines to be mapped onto the reconfigurable logic fabric; and (2) control threads that express, in a C-like language, the sequencing and coordination of data transfers between the hardware kernels and external DRAM. The compiler performs optimizations to increase parallelism and use DRAM bandwidth efficiently. It can target different FPGA platforms that support the CoRAM abstraction, either natively in a future FPGA or in soft-logic on today's devices. The CoRAM abstraction provides a convenient high-level compilation target to simplify the task of design optimization and system generation. The compiler is evaluated using three test programs (matrix-matrix multiplication, k-nearest neighbor, and 2D convolution) on the Xilinx ML605 and the Altera DE4. Results show that our compiler is able to target the different platforms and effectively exploit their dissimilar capacities and features. Depending on the application, the compiler-generated implementations achieve performance ranging from a factor of 4 slower to a factor of 2 faster relative to hand-designed implementations, as measured on actual hardware.

References

[1]
Michael Adler, Kermin E. Fleming, Angshuman Parashar, Michael Pellauer, and Joel Emer. Leap Scratchpads: Automatic Memory and Cache Management for Reconfigurable Logic. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays}, FPGA '11, pages 25--28, New York, NY, USA, 2011. ACM.
[2]
Christophe Alias, Bogdan Pasca, and Alexandru Plesco. FPGA-Specific Synthesis of Loop-Nests With Pipelined Computational Cores. Microprocessors and Microsystems, June 2012.
[3]
J.D. Bakos. High-Performance Heterogeneous Computing with the Convey HC-1. Computing in Science Engineering, 12(6):80 --87, nov.-dec. 2010.
[4]
Wenqi Bao, Jiang Jiang, Yuzhuo Fu, and Qing Sun. A Reconfigurable Macro-Pipelined Systolic Accelerator Architecture. In FPT, pages 1--6, 2011.
[5]
Cedric Bastoul. Code Generation in the Polyhedral Model Is Easier Than You Think. In Proceedings of the 13th International Conference on Parallel Architectures and Compilation Techniques, PACT '04, pages 7--16, Washington, DC, USA, 2004. IEEE Computer Society.
[6]
Samuel Bayliss and George A. Constantinides. Optimizing SDRAM bandwidth for Custom FPGA Loop Accelerators. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 195--204, New York, NY, USA, 2012. ACM.
[7]
Uday Bondhugula, Albert Hartono, J. Ramanujam, and P. Sadayappan. A Practical Automatic Polyhedral Parallelizer and Locality Optimizer. In Proceedings of the 2008 ACM SIGPLAN Conference on Programming Language Design and Implementation, PLDI '08, pages 101--113, New York, NY, USA, 2008. ACM.
[8]
Andrew Canis, Jongsok Choi, Mark Aldham, Victor Zhang, Ahmed Kammoona, Jason H. Anderson, Stephen Brown, and Tomasz Czajkowski. LegUp: High-Level Synthesis for FPGA-Based Processor/Accelerator Systems. In Proceedings of the 19th ACM/SIGDA international symposium on Field programmable gate arrays, FPGA '11, pages 33--36, New York, NY, USA, 2011. ACM.
[9]
Shuai Che, Jie Li, Jeremy W. Sheaffer, Kevin Skadron, and John Lach. Accelerating Compute-Intensive Applications with GPUs and FPGAs. In Proceedings of the 2008 Symposium on Application Specific Processors, SASP '08, pages 101--107, Washington, DC, USA, 2008. IEEE Computer Society.
[10]
Eric S. Chung, James C. Hoe, and Ken Mai. CoRAM: An In-Fabric Memory Architecture for FPGA-Based Computing. In Proceedings of the 19th ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '11, pages 97--106, New York, NY, USA, 2011. ACM.
[11]
Eric S. Chung, Peter A. Milder, James C. Hoe, and Ken Mai. Single-Chip Heterogeneous Computing : Does the Future Include Custom Logic, FPGAs, and GPGPUs? International Symposium on Microarchitecture (MICRO-43), Atlanta, GA, 2010, pages 225--236, 2010.
[12]
Eric S. Chung, Michael K. Papamichael, Gabriel Weisz, James C. Hoe, and Ken Mai. Prototype and evaluation of the CoRAM Memory Architecture for FPGA-Based Computing. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 139--142, New York, NY, USA, 2012. ACM.
[13]
Jason Cong, Peng Zhang, and Yi Zou. Combined Loop Transformation and Hierarchy Allocation for Data Reuse Optimization. In Proceedings of the International Conference on Computer-Aided Design, ICCAD '11, pages 185--192, Piscataway, NJ, USA, 2011. IEEE Press.
[14]
Pedro C. Diniz and Joonseok Park. Data Reorganization Engines for the Next Generation of System-on-a-Chip FPGAs. In Proceedings of the 2002 ACM/SIGDA Tenth International Symposium on Field-Programmable Gate Arrays, FPGA '02, pages 237--244, New York, NY, USA, 2002. ACM.
[15]
Jeremy Fowers, Greg Brown, Patrick Cooke, and Greg Stitt. A Performance and Energy Comparison of FPGAs, GPUs, and Multicores for Sliding-Window Applications. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 47--56, New York, NY, USA, 2012. ACM.
[16]
Tobias Grosser, Hongbin Zheng, Raghesh A, Andreas Simburger, Armin Grosslinger, and Louis-Noel Pouchet. Polly - Polyhedral Optimization in LLVM. In First International Workshop on Polyhedral Compilation Techniques (IMPACT'11), Chamonix, France, April 2011.
[17]
Robert Kirchgessner, Greg Stitt, Alan George, and Herman Lam. VirtualRC: A Virtual FPGA Platform for Applications and Tools Portability. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 205--208, New York, NY, USA, 2012. ACM.
[18]
Christian Lengauer. Loop parallelization in the polytope model. In Proceedings of the 4th International Conference on Concurrency Theory, CONCUR '93, pages 398--416, London, UK, UK, 1993. Springer-Verlag.
[19]
Michael K. Papamichael and James C. Hoe. CONNECT: Re-examining Conventional Wisdom for Designing NOCS in the Context of FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '12, pages 37--46, New York, NY, USA, 2012. ACM.
[20]
David Barrie Thomas, Lee Howes, and Wayne Luk. A Comparison of CPUs, GPUs, FPGAs, and Massively Parallel Processor Arrays for Random Number Generation. In Proceedings of the ACM/SIGDA International Symposium on Field Programmable Gate Arrays, FPGA '09, pages 63--72, New York, NY, USA, 2009. ACM.
[21]
Jingling Xue. On Loop Restructuring by Converting Imperfect to Perfect Loop Nests. In IEEE Second International Conference on Algorithms and Architectures for Parallel Processing, 1996. ICAPP '96., pages 421 --429, jun 1996.
[22]
www.ece.cmu.edu/coram.
[23]
www.jacquardcomputing.com/roccc/.
[24]
http:/www.llvm.org/.
[25]
www.impulseaccelerated.com/products.htm.
[26]
www.mentor.com/products/fpga/handel-c/.
[27]
www.mentor.com/esl/catapult/overview.
[28]
www.xilinx.com/products/design-tools/vivado/index.htm.
[29]
www.altera.com/devices/processor/nios2/tools/c2h/ni2-c2h.html.
[30]
www.xilinx.com/support/documentation/data_sheets/ds150.pdf.
[31]
www.altera.com/literature/hb/stratix-iv/stratix4_handbook.pdf.
[32]
www.xilinx.com/support/documentation/data_sheets/ds180_7Series_Overview.pdf.

Cited By

View all
  • (2023)High-Level Synthesis of Memory Systems for Decoupled Data OrchestrationApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_1(3-18)Online publication date: 16-Sep-2023
  • (2015)CMOSTProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744807(1-6)Online publication date: 7-Jun-2015
  • (2014)PAMS: Pattern Aware Memory System for embedded systems2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032544(1-7)Online publication date: Dec-2014
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
FPGA '13: Proceedings of the ACM/SIGDA international symposium on Field programmable gate arrays
February 2013
294 pages
ISBN:9781450318877
DOI:10.1145/2435264
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 February 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. FPGA computing
  2. data reuse
  3. high-level synthesis
  4. loop optimization

Qualifiers

  • Research-article

Conference

FPGA '13
Sponsor:

Acceptance Rates

Overall Acceptance Rate 125 of 627 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)10
  • Downloads (Last 6 weeks)0
Reflects downloads up to 22 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2023)High-Level Synthesis of Memory Systems for Decoupled Data OrchestrationApplied Reconfigurable Computing. Architectures, Tools, and Applications10.1007/978-3-031-42921-7_1(3-18)Online publication date: 16-Sep-2023
  • (2015)CMOSTProceedings of the 52nd Annual Design Automation Conference10.1145/2744769.2744807(1-6)Online publication date: 7-Jun-2015
  • (2014)PAMS: Pattern Aware Memory System for embedded systems2014 International Conference on ReConFigurable Computing and FPGAs (ReConFig14)10.1109/ReConFig.2014.7032544(1-7)Online publication date: Dec-2014
  • (2014)Architectural synthesis of computational pipelines with decoupled memory access2014 International Conference on Field-Programmable Technology (FPT)10.1109/FPT.2014.7082758(83-90)Online publication date: Dec-2014

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media