research-article

Open access

Impact of Parallelism and Memory Architecture on FPGA Communication Energy

Authors:

André DehonAuthors Info & Claims

ACM Transactions on Reconfigurable Technology and Systems (TRETS), Volume 9, Issue 4

Article No.: 30, Pages 1 - 23

https://doi.org/10.1145/2857057

Published: 22 August 2016 Publication History

Abstract

The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this article, we explore how to use data placement and parallelism to reduce communication energy. We show that parallelism can reduce energy and that the optimal level of parallelism increases with the problem size. We further explore how FPGA memory architecture (memory block size(s), memory banking, and spacing between memory banks) can impact communication energy, and determine how to organize the memory architecture to guarantee that the energy overhead compared to the optimally matched architecture for the design is never more than 60%. We specifically show that an architecture with 32 bit wide, 16Kb internally banked memories placed every 8 columns of 10 4-LUT logic blocks is within 61% of the optimally matched architecture across the VTR 7 benchmark set and a set of parallelism-tunable benchmarks. Without internal banking, the worst-case overhead is 98%, achieved with an architecture with 32 bit wide, 8Kb memories placed every 9 columns, roughly comparable to the memory organization on the Cyclone V (where memories are placed about every 10 columns). Monolithic 32 bit wide, 16Kb memories placed every 10 columns (comparable to 18Kb and 20Kb memories used in Virtex 4 and Stratix V FPGAs) have a 180% worst-case energy overhead. Furthermore, we show practical cases where designs mapped for optimal parallelism use 4.7 × less energy than designs using a single processing element.

References

[1]

Altera Corporation. 2013. PowerPlay Early Power Estimator. Altera Corporation, San Jose, CA. http://www.altera.com/support/devices/estimator/pow-powerplay.jsp.

[2]

Vaughn Betz, Jonathan Rose, and Alexander Marquardt. 1999. Architecture and CAD for Deep-Submicron FPGAs. Kluwer, Norwell, MA.

Digital Library

[3]

Sandeep Bhatt and Frank Thomson Leighton. 1984. A framework for solving VLSI graph layout problems. Journal of Computer System Sciences 28, 300--343.

[4]

Bluespec. 2012. Bluespec SystemVerilog 2012.01.A. Available at http://www.bluespec.com.

[5]

S. Y. I. Chin, C. S. P. Lee, and Steven J. E. Wilton. 2006. Power implications of implementing logic using FPGA embedded memory arrays. In Proceedings of the International Conference on Field-Programmable Logic and Applications. 1--8.

[6]

André DeHon. 1999. Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100&percnt; LUT utilization). In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 69--78.

Digital Library

[7]

André DeHon. 2015. Fundamental underpinnings of reconfigurable computing architectures. Proceedings of the IEEE 103, 3, 355--378.

[8]

Michael Delorimier, Nachiket Kapre, Nikil Mehta, and André DeHon. 2011. Spatial hardware implementation for sparse graph algorithms in GraphStep. ACM Transactions on Autonomous and Adaptive Systems 6, 3, Article No. 17.

Digital Library

[9]

Wilm E. Donath. 1979. Placement and average interconnection lengths of computer logic. IEEE Transactions on Circuits and Systems 26, 4, 272--277.

[10]

M. Genovese and E. Napoli. 2014. ASIC and FPGA implementation of the Gaussian mixture model algorithm for real-time segmentation of high definition video. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 3, 537--547.

Digital Library

[11]

J. B. Goeders and Steven J. E. Wilton. 2012. VersaPower: Power estimation for diverse FPGA architectures. In Proceedings of the International Conference on Field-Programmable Technology. 229--234.

[12]

Thomas L. Heath and Euclid. 1956. The Thirteen Books of Euclid’s Elements, Books I and II (2nd ed.). Dover Publications.

Digital Library

[13]

ITRS. 2012. International Technology Roadmap for Semiconductors. Available at http://www.itrs2.net/itrs-reports.html.

[14]

Edin Kadric, David Lakata, and André DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 146--155.

Digital Library

[15]

Edin Kadric, Kunal Mahajan, and André DeHon. 2014. Kung Fu data energy-minimizing communication energy in FPGA computations. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines.

Digital Library

[16]

Dirk Koch and Jim Torresen. 2011. FPGASort: A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 45--54.

Digital Library

[17]

Ian Kuon and Jonathan Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2, 203--215.

Digital Library

[18]

J. Lamoureux and Steven J. E. Wilton. 2006. Activity estimation for field-programmable gate arrays. In Proceedings of the International Conference on Field-Programmable Logic and Applications. 1--8.

[19]

B. S. Landman and R. L. Russo. 1971. On pin versus block relationship for partitions of logic circuits. IEEE Transactions on Computers 20, 1469--1479.

Digital Library

[20]

David Lewis, Elias Ahmed, David Cashman, Tim Vanderhoek, Chris Lane, Andy Lee, and Philip Pan. 2009. Architectural enhancements in Stratix-III and Stratix-IV. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 33--42.

Digital Library

[21]

David Lewis, David Cashman, Mark Chan, Jeffery Chromczak, Gary Lai, Andy Lee, Tim Vanderhoek, and Haiming Yu. 2013. Architectural enhancements in Stratix V. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 147--156.

Digital Library

[22]

Jason Luu, Jason Helge Anderson, and Jonathan Scott Rose. 2011. Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 227--236.

Digital Library

[23]

Jason Luu, Jeffrey Goeders, Michael Wainberg, Andrew Somerville, Thien Yu, Konstantin Nasartschuk, Miad Nasr, et al. 2014. VTR 7.0: Next generation architecture and CAD system for FPGAs. ACM Transactions on Reconfigurable Technology and Systems 7, 2, 6:1--6:30.

Digital Library

[24]

Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. HPL 2009-85. HP Labs, Palo Alto, CA. http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.

[25]

Kara K. W. Poon, Steven J. E. Wilton, and Andy Yan. 2005. A detailed power model for field-programmable gate arrays. ACM Transactions on Design Automation of Electronic Systems 10, 2, 279--302.

Digital Library

[26]

Jonathan Rose, Jason Luu, Chi Wai Yu, Opal Densmore, Jeffrey Goeders, Andrew Somerville, Kenneth B. Kent, Peter Jamieson, and Jason Anderson. 2012. The VTR Project: Architecture and CAD for FPGAs from Verilog to routing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 77--86.

Digital Library

[27]

R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2, 278--290.

Digital Library

[28]

C. Thompson. 1979. Area-time complexity for VLSI. In Proceedings of the ACM Symposium on Theory of Computing. 81--88.

Digital Library

[29]

Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 5--14.

Digital Library

Cited By

Chiranjeevi GKulkarni S(2023)Reconfigurable Platform Pre-Processing MAC Unit Design: For Image Processing Core Architecture in Restoration ApplicationsLatest Advances and New Visions of Ontology in Information Science10.5772/intechopen.108139Online publication date: 28-Jun-2023
https://doi.org/10.5772/intechopen.108139
Bailey D(2023)Design ConstraintsDesign for Embedded Image Processing on FPGAs10.1002/9781119819820.ch4(77-104)Online publication date: 5-Sep-2023
https://doi.org/10.1002/9781119819820.ch4
Stabile RDabos GVagionas CShi BCalabretta NPleros N(2021)Neuromorphic photonics: 2D or not 2D?Journal of Applied Physics10.1063/5.0047946129:20Online publication date: 24-May-2021
https://doi.org/10.1063/5.0047946
Show More Cited By

Index Terms

Impact of Parallelism and Memory Architecture on FPGA Communication Energy
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Reconfigurable computing

Recommendations

Impact of Memory Architecture on FPGA Energy Consumption
FPGA '15: Proceedings of the 2015 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays

FPGAs have the advantage that a single component can be configured post-fabrication to implement almost any computation. However, designing a one-size-fits-all memory architecture causes an inherent mismatch between the needs of the application and the ...
Architecting phase change memory as a scalable dram alternative

Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM'...
Architecting phase change memory as a scalable dram alternative
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Memory scaling is in jeopardy as charge storage and sensing mechanisms become less reliable for prevalent memory technologies, such as DRAM. In contrast, phase change memory (PCM) storage relies on scalable current and thermal mechanisms. To exploit PCM'...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Reconfigurable Technology and Systems

ACM Transactions on Reconfigurable Technology and Systems Volume 9, Issue 4

Regular Papers and Special Section on Field Programmable Gate Arrays (FPGA) 2015

September 2016

161 pages

ISSN:1936-7406

EISSN:1936-7414

DOI:10.1145/2984740

Editor:
Steve Wilton
Department of Electrical and Computer Engineering/University of British Columbia/Kaiser 4112, 5500-2332 Main Mall/Vancouver, BC V6T 1Z4 Canada

Issue’s Table of Contents

Copyright © 2016 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 August 2016

Accepted: 01 December 2015

Received: 01 July 2015

Published in TRETS Volume 9, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

DARPA/CMO
VIPER program at the University of Pennsylvania

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
563
Total Downloads

Downloads (Last 12 months)56
Downloads (Last 6 weeks)7

Reflects downloads up to 13 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Chiranjeevi GKulkarni S(2023)Reconfigurable Platform Pre-Processing MAC Unit Design: For Image Processing Core Architecture in Restoration ApplicationsLatest Advances and New Visions of Ontology in Information Science10.5772/intechopen.108139Online publication date: 28-Jun-2023
https://doi.org/10.5772/intechopen.108139
Bailey D(2023)Design ConstraintsDesign for Embedded Image Processing on FPGAs10.1002/9781119819820.ch4(77-104)Online publication date: 5-Sep-2023
https://doi.org/10.1002/9781119819820.ch4
Stabile RDabos GVagionas CShi BCalabretta NPleros N(2021)Neuromorphic photonics: 2D or not 2D?Journal of Applied Physics10.1063/5.0047946129:20Online publication date: 24-May-2021
https://doi.org/10.1063/5.0047946
Chiranjeevi GKulkarni S(2021)Pre-processing Block Hardware Architecture in Image Processing Using Reconfigurable PlatformSecond International Conference on Image Processing and Capsule Networks10.1007/978-3-030-84760-9_13(138-145)Online publication date: 10-Sep-2021
https://doi.org/10.1007/978-3-030-84760-9_13
Nahmias Mde Lima TTait APeng HShastri BPrucnal P(2020)Photonic Multiply-Accumulate Operations for Neural NetworksIEEE Journal of Selected Topics in Quantum Electronics10.1109/JSTQE.2019.294148526:1(1-18)Online publication date: Jan-2020
https://doi.org/10.1109/JSTQE.2019.2941485
Garcia PBhowmik DStewart RMichaelson GWallace A(2019)Optimized Memory Allocation and Power Minimization for FPGA-Based Image ProcessingJournal of Imaging10.3390/jimaging50100075:1(7)Online publication date: 1-Jan-2019
https://doi.org/10.3390/jimaging5010007

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents