Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

Published: 01 February 2004 Publication History

Abstract

We present results of extensive hardware/software partitioning experiments on numerous benchmarks. We describe our loop-oriented partitioning methodology for moving critical code from hardware to software. Our benchmarks included programs from PowerStone, MediaBench, and NetBench. Our experiments included estimated results for partitioning using an 8051 8-bit microcontroller or a 32-bit MIPS microprocessor for the software, and using on-chip configurable logic or custom application-specific integrated circuit hardware for the hardware. Additional experiments involved actual measurements taken from several physical implementations of hardware/software partitionings on real single-chip microprocessor/configurable-logic devices. We also estimated results assuming voltage scalable processors. We provide performance, energy, and size data for all of the experiments. We found that the benchmarks spent an average of 80% of their execution time in only 3% of their code, amounting to only about 200 bytes of critical code. For various experiments, we found that moving critical code to hardware resulted in average speedups of 3 to 5 and average energy savings of 35% to 70%, with average hardware requirements of only 5000 to 10,000 gates. To our knowledge, these experiments represent the most comprehensive hardware/software partitioning study published to date.

References

[1]
Altera Corporation. 2001. ARM-Based Embedded Processor PLDs.
[2]
Amdahl, G. 1967. Validity of the single processor approach to achieving large scale computing capabilities. In Proceedings AFIPS 1967 Spring Joint Computer Conference 30, 483--485.
[3]
Atmel FPSLIC, http://www.atmel.com/atmel/products/prod39.htm.
[4]
Balboni, A., Fornaciari, W., and Sciuto, W. 1996. Partitioning and exploration in the TOSCA co-design flow. In Proceedings of the International Workshop on Hardware/Software Codesign, 62--69.
[5]
Burger, D. and Austin, T. M. 1997. The SimpleScalar tool set, Version 2.0. In Tech. Rep. #1342, University of Wisconsin-Madison Computer Sciences Department.
[6]
E5 Press Release, http://www.triscend.com/about/indexrelease051401.html.
[7]
Eles, P., Peng, Z., Kuchcinsky, K., and Doboli, A. 1997. System level hardware/software partitioning based on simulated annealing and tabu search. Design Automation for Embedded Systems 2, 1, 5--32.
[8]
Gajski, D.D., Vahid, F., Narayan, S., and Gong, J. 1998. SpecSyn: An environment supporting the specify-explore-refine paradigm for hardware/software system design. IEEE Transactions on VLSI Systems 6, 1, 84--100.
[9]
Givargis, T., Vahid F., and Henkel, J. 2001. System-level exploration for pareto-optimal configurations in parameterized systems-on-a-chip. In Proceedings of the International Conference on Computer-Aided Design (ICCAD).
[10]
Gokhale, M. and Stone, J. 1998. NAPA C: Compiling for hybrid RISC/FPGA architectures. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines (FCCM).
[11]
Gonzalez, R., Gordon, B., and Horowitz, M. 1997. Supply and threshold voltage scaling for low power CMOS. IEEE Journal of Solid-State Circuits 32, 8.
[12]
Hauser, J. and Wawrzynek, J. 1997. Garp: A MIPS processor with a reconfigurable coprocessor. In Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines, Napa Valley, CA, 12--21.
[13]
Henkel, J. 1999. A low power hardware/software partitioning approach for core-based embedded systems. In Proceedings of the 36th ACM/IEEE Design Automation Conference, 122--127.
[14]
Henkel, J. and Ernst R. 1997. A hardware/software partitioner using a dynamically determined granularity. In Proceedings of the Design Automation Conference.
[15]
Henkel, J. and Li, Y. 1998. Energy-conscious HW/SW-partitioning of embedded systems: A Case Study on an MPEG-2 Encoder. In Proceedings of 6th International Workshop on Hardware/Software Codesign, 23--27.
[16]
Hou, J. and Wolf, W. 1996. Process partitioning for distributed embedded systems. In Proceeding International Workshop on Hardware/Software Codesign.
[17]
Intel XScale Processor, http://developer.intel.com/design/intelxscale.
[18]
Kalavade, A. and Lee, E. 1994. A global criticality/local phase driven algorithm for the constrained hardware/software partitioning problem. In Proceedings of the International Workshop on Hardware/Software Codesign, 42--48.
[19]
Lee, C., Potkonjak, M., and Magione-Smith, W. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of MICRO.
[20]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In Proceedings of the International Symposium on Low Power Electronics and Design.
[21]
MediaBench. http://www.cs.ucla.edu/∼leec/mediabench/.
[22]
Mernik, G., Mangione-Smith, W. H., and Hu, W. 2001. NetBench: A benchmarking suite for network processors. In Proceedings of the IEEE/ACM International Conference on Computer Aided Design, 39--42.
[23]
MIPS Technologies, Inc., http://www.mips.com.
[24]
Stitt, G., Grattan, B., Villarreal, J., and Vahid, F. 2002. Using on-chip configurable logic to reduce embedded system software energy. In Proceedings of the IEEE Symposium on Field-Programmable Custom Computing Machines, Napa Valley, CA.
[25]
Synopsys, http://www.synopsys.com.
[26]
Triscend Corporation, http://www.triscend.com. 2002.
[27]
University of California, Riverside; Dalton Project. http://www.cs.ucr.edu/∼dalton.
[28]
Vanmeerbeeck, G., Schaumont, P., Vernalde, S., Engels, M., and Bolsens, I. 2001. Hardware/software partitioning of embedded system in OCAPI-xl. In Proceedings of the International Symposium on Hardware/Software Codesign, 30--35.
[29]
Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. 2001. Loop analysis of embedded applications. In Tech. Rep. UCR-CSE-01-03, University of California, Riverside.
[30]
Virtex Power Estimator, http://support.xilinx.com/cgi-bin/powerweb.pl.
[31]
Wan, M., Ichikawa, Y., Lidsky, D., Rabaey, J. 1998. An energy conscious methodology for early design exploration of heterogeneous DSPs. In Proceedings of the IEEE Custom Integrated Circuits Conference, 111--117.
[32]
Werner, B. and Magnusson, P. 1997. A hybrid simulation approach enabling performance characterization of large software systems. In Proceedings of MASCOTS.
[33]
Xilinx Corporation. 2002. Virtex-II Pro Platform FGPA Handbook.

Cited By

View all
  • (2023)Optimization Methods of Multi-Core Embedded SystemHighlights in Science, Engineering and Technology10.54097/hset.v71i.1268671(153-162)Online publication date: 28-Nov-2023
  • (2021)Deep reinforcement learning‐based autonomous parking design with neural network compute acceleratorsConcurrency and Computation: Practice and Experience10.1002/cpe.667034:9Online publication date: 2-Nov-2021
  • (2019)A Memory-Reinforced Tabu Search Algorithm With Critical Path Awareness for HW/SW Partitioning on Reconfigurable MPSoCsIEEE Access10.1109/ACCESS.2019.29343907(112448-112458)Online publication date: 2019
  • Show More Cited By

Index Terms

  1. Energy savings and speedups from partitioning critical software loops to hardware in embedded systems

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Embedded Computing Systems
      ACM Transactions on Embedded Computing Systems  Volume 3, Issue 1
      February 2004
      232 pages
      ISSN:1539-9087
      EISSN:1558-3465
      DOI:10.1145/972627
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 01 February 2004
      Published in TECS Volume 3, Issue 1

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. FPGA
      2. Hardware/software partitioning
      3. embedded systems
      4. low energy
      5. platforms
      6. speedup
      7. synthesis

      Qualifiers

      • Article

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)6
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 13 Nov 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2023)Optimization Methods of Multi-Core Embedded SystemHighlights in Science, Engineering and Technology10.54097/hset.v71i.1268671(153-162)Online publication date: 28-Nov-2023
      • (2021)Deep reinforcement learning‐based autonomous parking design with neural network compute acceleratorsConcurrency and Computation: Practice and Experience10.1002/cpe.667034:9Online publication date: 2-Nov-2021
      • (2019)A Memory-Reinforced Tabu Search Algorithm With Critical Path Awareness for HW/SW Partitioning on Reconfigurable MPSoCsIEEE Access10.1109/ACCESS.2019.29343907(112448-112458)Online publication date: 2019
      • (2019)Designing and Developing Architectures to Tangible User Interfaces: A “Softwareless” ApproachHCI International 2019 - Posters10.1007/978-3-030-23528-4_64(469-475)Online publication date: 6-Jul-2019
      • (2017)Efficient Realization of Fixed-Point Binary and Ternary Adders on FPGAsJournal of Circuits, Systems and Computers10.1142/S021812661750053026:04(1750053)Online publication date: Apr-2017
      • (2017)PGMAMicroprocessors & Microsystems10.1016/j.micpro.2017.09.00254:C(83-96)Online publication date: 1-Oct-2017
      • (2017)LUT based realization of fixed-point multipliers targeting state-of-art FPGAsDesign Automation for Embedded Systems10.1007/s10617-017-9184-x21:2(89-115)Online publication date: 1-Jun-2017
      • (2017)An Efficient FIR Filter Structure Based on Technology-Optimized Multiply-Adder Unit Targeting LUT-Based FPGAsCircuits, Systems, and Signal Processing10.1007/s00034-016-0312-936:2(600-639)Online publication date: 1-Feb-2017
      • (2016)Library-Based Placement and Routing in FPGAs with Support of Partial ReconfigurationACM Transactions on Design Automation of Electronic Systems10.1145/290129521:4(1-26)Online publication date: 18-May-2016
      • (2015)Power efficient implementation of bit-parallel unrolled CORDIC structures for FPGA platforms2015 International Conference on VLSI Systems, Architecture, Technology and Applications (VLSI-SATA)10.1109/VLSI-SATA.2015.7050466(1-6)Online publication date: Jan-2015
      • Show More Cited By

      View Options

      Get Access

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media