Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Impact of Parallelism and Memory Architecture on FPGA Communication Energy

Published: 22 August 2016 Publication History

Abstract

The energy in FPGA computations is dominated by data communication energy, either in the form of memory references or data movement on interconnect. In this article, we explore how to use data placement and parallelism to reduce communication energy. We show that parallelism can reduce energy and that the optimal level of parallelism increases with the problem size. We further explore how FPGA memory architecture (memory block size(s), memory banking, and spacing between memory banks) can impact communication energy, and determine how to organize the memory architecture to guarantee that the energy overhead compared to the optimally matched architecture for the design is never more than 60%. We specifically show that an architecture with 32 bit wide, 16Kb internally banked memories placed every 8 columns of 10 4-LUT logic blocks is within 61% of the optimally matched architecture across the VTR 7 benchmark set and a set of parallelism-tunable benchmarks. Without internal banking, the worst-case overhead is 98%, achieved with an architecture with 32 bit wide, 8Kb memories placed every 9 columns, roughly comparable to the memory organization on the Cyclone V (where memories are placed about every 10 columns). Monolithic 32 bit wide, 16Kb memories placed every 10 columns (comparable to 18Kb and 20Kb memories used in Virtex 4 and Stratix V FPGAs) have a 180% worst-case energy overhead. Furthermore, we show practical cases where designs mapped for optimal parallelism use 4.7 × less energy than designs using a single processing element.

References

[1]
Altera Corporation. 2013. PowerPlay Early Power Estimator. Altera Corporation, San Jose, CA. http://www.altera.com/support/devices/estimator/pow-powerplay.jsp.
[2]
Vaughn Betz, Jonathan Rose, and Alexander Marquardt. 1999. Architecture and CAD for Deep-Submicron FPGAs. Kluwer, Norwell, MA.
[3]
Sandeep Bhatt and Frank Thomson Leighton. 1984. A framework for solving VLSI graph layout problems. Journal of Computer System Sciences 28, 300--343.
[4]
Bluespec. 2012. Bluespec SystemVerilog 2012.01.A. Available at http://www.bluespec.com.
[5]
S. Y. I. Chin, C. S. P. Lee, and Steven J. E. Wilton. 2006. Power implications of implementing logic using FPGA embedded memory arrays. In Proceedings of the International Conference on Field-Programmable Logic and Applications. 1--8.
[6]
André DeHon. 1999. Balancing interconnect and computation in a reconfigurable computing array (or, why you don’t really want 100% LUT utilization). In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 69--78.
[7]
André DeHon. 2015. Fundamental underpinnings of reconfigurable computing architectures. Proceedings of the IEEE 103, 3, 355--378.
[8]
Michael Delorimier, Nachiket Kapre, Nikil Mehta, and André DeHon. 2011. Spatial hardware implementation for sparse graph algorithms in GraphStep. ACM Transactions on Autonomous and Adaptive Systems 6, 3, Article No. 17.
[9]
Wilm E. Donath. 1979. Placement and average interconnection lengths of computer logic. IEEE Transactions on Circuits and Systems 26, 4, 272--277.
[10]
M. Genovese and E. Napoli. 2014. ASIC and FPGA implementation of the Gaussian mixture model algorithm for real-time segmentation of high definition video. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 22, 3, 537--547.
[11]
J. B. Goeders and Steven J. E. Wilton. 2012. VersaPower: Power estimation for diverse FPGA architectures. In Proceedings of the International Conference on Field-Programmable Technology. 229--234.
[12]
Thomas L. Heath and Euclid. 1956. The Thirteen Books of Euclid’s Elements, Books I and II (2nd ed.). Dover Publications.
[13]
ITRS. 2012. International Technology Roadmap for Semiconductors. Available at http://www.itrs2.net/itrs-reports.html.
[14]
Edin Kadric, David Lakata, and André DeHon. 2015. Impact of memory architecture on FPGA energy consumption. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 146--155.
[15]
Edin Kadric, Kunal Mahajan, and André DeHon. 2014. Kung Fu data energy-minimizing communication energy in FPGA computations. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines.
[16]
Dirk Koch and Jim Torresen. 2011. FPGASort: A high performance sorting architecture exploiting run-time reconfiguration on FPGAs for large problem sorting. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 45--54.
[17]
Ian Kuon and Jonathan Rose. 2007. Measuring the gap between FPGAs and ASICs. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2, 203--215.
[18]
J. Lamoureux and Steven J. E. Wilton. 2006. Activity estimation for field-programmable gate arrays. In Proceedings of the International Conference on Field-Programmable Logic and Applications. 1--8.
[19]
B. S. Landman and R. L. Russo. 1971. On pin versus block relationship for partitions of logic circuits. IEEE Transactions on Computers 20, 1469--1479.
[20]
David Lewis, Elias Ahmed, David Cashman, Tim Vanderhoek, Chris Lane, Andy Lee, and Philip Pan. 2009. Architectural enhancements in Stratix-III and Stratix-IV. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 33--42.
[21]
David Lewis, David Cashman, Mark Chan, Jeffery Chromczak, Gary Lai, Andy Lee, Tim Vanderhoek, and Haiming Yu. 2013. Architectural enhancements in Stratix V. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 147--156.
[22]
Jason Luu, Jason Helge Anderson, and Jonathan Scott Rose. 2011. Architecture description and packing for logic blocks with hierarchy, modes and complex interconnect. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 227--236.
[23]
Jason Luu, Jeffrey Goeders, Michael Wainberg, Andrew Somerville, Thien Yu, Konstantin Nasartschuk, Miad Nasr, et al. 2014. VTR 7.0: Next generation architecture and CAD system for FPGAs. ACM Transactions on Reconfigurable Technology and Systems 7, 2, 6:1--6:30.
[24]
Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi. 2009. CACTI 6.0: A Tool to Model Large Caches. HPL 2009-85. HP Labs, Palo Alto, CA. http://www.hpl.hp.com/techreports/2009/HPL-2009-85.html.
[25]
Kara K. W. Poon, Steven J. E. Wilton, and Andy Yan. 2005. A detailed power model for field-programmable gate arrays. ACM Transactions on Design Automation of Electronic Systems 10, 2, 279--302.
[26]
Jonathan Rose, Jason Luu, Chi Wai Yu, Opal Densmore, Jeffrey Goeders, Andrew Somerville, Kenneth B. Kent, Peter Jamieson, and Jason Anderson. 2012. The VTR Project: Architecture and CAD for FPGAs from Verilog to routing. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. ACM, New York, NY, 77--86.
[27]
R. Tessier, V. Betz, D. Neto, A. Egier, and T. Gopalsamy. 2007. Power-efficient RAM mapping algorithms for FPGA embedded memory blocks. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 26, 2, 278--290.
[28]
C. Thompson. 1979. Area-time complexity for VLSI. In Proceedings of the ACM Symposium on Theory of Computing. 81--88.
[29]
Henry Wong, Vaughn Betz, and Jonathan Rose. 2011. Comparing FPGA vs. custom CMOS and the impact on processor microarchitecture. In Proceedings of the International Symposium on Field-Programmable Gate Arrays. 5--14.

Cited By

View all
  • (2023)Reconfigurable Platform Pre-Processing MAC Unit Design: For Image Processing Core Architecture in Restoration ApplicationsLatest Advances and New Visions of Ontology in Information Science10.5772/intechopen.108139Online publication date: 28-Jun-2023
  • (2023)Design ConstraintsDesign for Embedded Image Processing on FPGAs10.1002/9781119819820.ch4(77-104)Online publication date: 5-Sep-2023
  • (2021)Neuromorphic photonics: 2D or not 2D?Journal of Applied Physics10.1063/5.0047946129:20Online publication date: 24-May-2021
  • Show More Cited By

Index Terms

  1. Impact of Parallelism and Memory Architecture on FPGA Communication Energy

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Reconfigurable Technology and Systems
    ACM Transactions on Reconfigurable Technology and Systems  Volume 9, Issue 4
    Regular Papers and Special Section on Field Programmable Gate Arrays (FPGA) 2015
    September 2016
    161 pages
    ISSN:1936-7406
    EISSN:1936-7414
    DOI:10.1145/2984740
    • Editor:
    • Steve Wilton
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 22 August 2016
    Accepted: 01 December 2015
    Received: 01 July 2015
    Published in TRETS Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. FPGA
    2. architecture
    3. banking
    4. communication
    5. energy
    6. memory
    7. power

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    • DARPA/CMO
    • VIPER program at the University of Pennsylvania

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)56
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Reconfigurable Platform Pre-Processing MAC Unit Design: For Image Processing Core Architecture in Restoration ApplicationsLatest Advances and New Visions of Ontology in Information Science10.5772/intechopen.108139Online publication date: 28-Jun-2023
    • (2023)Design ConstraintsDesign for Embedded Image Processing on FPGAs10.1002/9781119819820.ch4(77-104)Online publication date: 5-Sep-2023
    • (2021)Neuromorphic photonics: 2D or not 2D?Journal of Applied Physics10.1063/5.0047946129:20Online publication date: 24-May-2021
    • (2021)Pre-processing Block Hardware Architecture in Image Processing Using Reconfigurable PlatformSecond International Conference on Image Processing and Capsule Networks10.1007/978-3-030-84760-9_13(138-145)Online publication date: 10-Sep-2021
    • (2020)Photonic Multiply-Accumulate Operations for Neural NetworksIEEE Journal of Selected Topics in Quantum Electronics10.1109/JSTQE.2019.294148526:1(1-18)Online publication date: Jan-2020
    • (2019)Optimized Memory Allocation and Power Minimization for FPGA-Based Image ProcessingJournal of Imaging10.3390/jimaging50100075:1(7)Online publication date: 1-Jan-2019

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media