article

Tiny instruction caches for low power embedded systems

Authors:

Ann Gordon-Ross,

Susan Cotterell,

Frank VahidAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 2, Issue 4

Pages 449 - 481

https://doi.org/10.1145/950162.950163

Published: 01 November 2003 Publication History

Abstract

Instruction caches have traditionally been used to improve software performance. Recently, several tiny instruction cache designs, including filter caches and dynamic loop caches, have been proposed to instead reduce software power. We propose several new tiny instruction cache designs, including preloaded loop caches, and one-level and two-level hybrid dynamic/preloaded loop caches. We evaluate the existing and proposed designs on embedded system software benchmarks from both the Powerstone and MediaBench suites, on two different processor architectures, for a variety of different technologies. We show on average that filter caching achieves the best instruction fetch energy reductions of 60--80%, but at the cost of about 20% performance degradation, which could also affect overall energy savings. We show that dynamic loop caching gives good instruction fetch energy savings of about 30%, but that if a designer is able to profile a program, preloaded loop caching can more than double the savings. We describe automated methods for quickly determining the best loop cache configuration, methods useful in a core-based design flow.

References

[1]

Aghaghiri, Y., Fallah, F., and Pedram, M. 2001. Irredundant address bus encoding for low power. International Symposium on Low Power Electronics and Design, 82--87.

[2]

Albonesi, D. H. 2000. Selective cache ways: On-demand cache resource allocation. Journal of Instruction Level Parallelism.

[3]

Artisan. http://www.artisan.com.

[4]

Bajwa, R. S., Hiraki, M., Kojima, H., Gorney, D., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Transactions on VLSI Systems, 417--424.

[5]

Bellas, N., Hajj, I., Polychronopoulos, C., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In International Conference on Computer Design, 378--383.

[6]

Benini, L., Demicheli, G., Macii, E., Sciuto, D., and Silvano, C. 1998. Address bus encoding techniques for system-level power optimization. In Design Automation and Test in Europe.

[7]

Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In 27th Annual International Symposium on Computer Architecture.

[8]

Burger, D., Austin, T., and Bennet, S. 1996. Evaluating future microprocessors: the simplescalar toolset. University of Wisconsin-Madison. Computer Science Department. Tech. Report CS-TR-1308.

[9]

Compaq Western Research Labs n.d. CACTI 3.0, http://research.compaq.com/wrl/people/jouppi/CACTI.html.

[10]

Cotterell, S. and Vahid, F. 2002. Synthesis of customized loop caches for core-based embedded systems. In International Conference on Computer Aided Design.

[11]

Edler, J. and Hill, M. D. n.d. Dinero IV trace-driven uniprocessor cache simulator, http://www.cs.wisc.edu/&sim;markhill/DineroIV

[12]

Govindarajan, S. C., Ramaswamy, G., and Mehendale, M. 2001. Area and power reduction of embedded DSP systems using instruction compression and re-configurable encoding. In International Conference on Computer Aided Design.

[13]

Hasegawa, A., Kawasaki, I., Yamda, K., Yoshioka, S., Kawasaki, S., and Biswas, P. 1995. SH3 high codes density, low power. IEEE Micro.

[14]

The International Technology Roadmap for Semiconductors (ITRS). Semiconductor Industry Association. 1999.

[15]

Ishihara, Y. and Yasuura, H. 2000. A power reduction technique with object code merging for application specific embedded processors. In Design Automation and Test in Europe.

[16]

Jouppi, N. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In International Symposium on Computer Architecture, 364--373.

[17]

Kin, J., Gupta, M., and Mangione-Smith, W. 1997. The filter cache: An energy efficient memory structure. In International Symposium on Microarchitecture, 184--193.

[18]

Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings 30th Annual International Symposium on Microarchitecture.

[19]

Lee, L. H., Moyer, B., and Arends. 1999a. Low-cost embedded program loop caching---revisited. U. Mich. Technical Report Number CSE-TR-411-99.

[20]

Lee, L. H., Moyer, B., and Arends, J. 1999b. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In International Symposium On Low Power Electronics and Design.

[21]

Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In International Symposium on Low Power Electronics and Design.

[22]

Moyer, B., Lee, L. H., and Arends, J. 1999. Data processing system having a cache and method thereof, US Patent number 5, 893, 142.

[23]

Segars, S. 2001. Low power design techniques for microprocessors. In IEEE International Solid-State Circuits Conference.

[24]

Sias, J. W., Hunter, H. C., and Hwu, W. W. 2001. Enhancing loop buffering of media and telecommunications applications using low-overhead predication, In Proc. of the 34th International Symposium on Microarchitecture.

[25]

Stan, M. R. and Burleson, W. P. 1995. Bus-invert coding for low-power I/O. IEEE Transactions on Very Large Scale Integration Systems 3, 1, 49--58.

[26]

Synopsys Inc. n.d. http://www.synopsys.com.

[27]

Sugumar, R. and Abraham, S. 1991. Efficient simulation of multiple cache configurations using binomial trees. Technical Report CSE-TR-111-91, CSE Division, University of Michigan.

[28]

Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. n.d. Loop analysis of embedded applications. UC Riverside CS&E Technical Report UCR-CSE-01-03.

[29]

Villarreal, J., Suresh, D., Stitt, G., Vahid, F., and NAjjar, W. 2002. Improving software performance with configurable logic. Design Automation of Embedded Systems.

[30]

Wu, Z. and Wolf, W. 1999. Iterative cache simulation of embedded CPUs with trace stripping. In International Conference on Hardware/Software Co-Design.

Cited By

Mohammadi MHan SAtoofian EBaniasadi AAamodt TDally W(2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1109/TC.2019.2956710
Huang SWang JWang DZhang MYou H(2018)A Thread-Saving Schedule with Graph Analysis for Parallel Deep Learning Applications on Embedded Systems2018 IEEE International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud.2018.00026(111-115)Online publication date: Sep-2018
https://doi.org/10.1109/SmartCloud.2018.00026
Deshmukh AJothish MChandrasekaran K(2015)Optimized cryptographic algorithm for embedded systems2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)10.1109/ICATCCT.2015.7456850(33-38)Online publication date: Oct-2015
https://doi.org/10.1109/ICATCCT.2015.7456850
Show More Cited By

Index Terms

Tiny instruction caches for low power embedded systems
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

Adaptive loop caching using lightweight runtime control flow analysis
Special section on ESTIMedia'12, LCTES'11, rigorous embedded systems design, and multiprocessor system-on-chip for cyber-physical systems

Loop caches provide an effective method for decreasing memory hierarchy energy consumption by storing frequently executed code (critical regions) in a more energy efficient structure than the level one cache. However, due to code structure restrictions ...
Enabling large decoded instruction loop caching for energy-aware embedded processors
CASES '10: Proceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems

Low energy consumption in embedded processors is increasingly important in step with the system complexity. The on-chip instruction cache (I-cache) is usually a most energy consuming component on the processor chip due to its large size and frequent ...
A highly configurable cache for low energy embedded systems

Energy consumption is a major concern in many embedded computing systems. Several studies have shown that cache memories account for about 50% of the total energy consumed in these systems. The performance of a given cache architecture is determined, to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 2, Issue 4

November 2003

165 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/950162

Issue’s Table of Contents

Copyright © 2003 ACM.

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 01 November 2003

Published in TECS Volume 2, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
1,625
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)0

Reflects downloads up to 10 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Mohammadi MHan SAtoofian EBaniasadi AAamodt TDally W(2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
https://dl.acm.org/doi/10.1109/TC.2019.2956710
Huang SWang JWang DZhang MYou H(2018)A Thread-Saving Schedule with Graph Analysis for Parallel Deep Learning Applications on Embedded Systems2018 IEEE International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud.2018.00026(111-115)Online publication date: Sep-2018
https://doi.org/10.1109/SmartCloud.2018.00026
Deshmukh AJothish MChandrasekaran K(2015)Optimized cryptographic algorithm for embedded systems2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)10.1109/ICATCCT.2015.7456850(33-38)Online publication date: Oct-2015
https://doi.org/10.1109/ICATCCT.2015.7456850
Mittal S(2014)A survey of architectural techniques for improving cache power efficiencySustainable Computing: Informatics and Systems10.1016/j.suscom.2013.11.0014:1(33-43)Online publication date: Mar-2014
https://doi.org/10.1016/j.suscom.2013.11.001
Karthika MRajasekaran C(2012)Scratchpad memory-global power optimizationInternational Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)10.1109/ICPRIME.2012.6208343(199-203)Online publication date: Mar-2012
https://doi.org/10.1109/ICPRIME.2012.6208343
Park JBalfour JDally WKathail VTatge RBarua R(2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
https://dl.acm.org/doi/10.1145/1878921.1878943
Janapsatya AParameswaran SIgnjatović AWakabayashi K(2009)HitMEProceedings of the 2009 Asia and South Pacific Design Automation Conference10.5555/1509633.1509720(335-340)Online publication date: 19-Jan-2009
https://dl.acm.org/doi/10.5555/1509633.1509720
Janapsatya AParameswaran SIgnjatovic A(2009)HitME: Low power Hit MEmory buffer for embedded systems2009 Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2009.4796503(335-340)Online publication date: Jan-2009
https://doi.org/10.1109/ASPDAC.2009.4796503
Canedo AAbderazek BSowa M(2009)Compiler Support for Code Size Reduction Using a Queue-Based ProcessorTransactions on High-Performance Embedded Architectures and Compilers II10.1007/978-3-642-00904-4_14(269-285)Online publication date: 22-Apr-2009
https://dl.acm.org/doi/10.1007/978-3-642-00904-4_14
Black-Schaffer DBalfour JDally WParikh VPark J(2008)Hierarchical Instruction Register OrganizationIEEE Computer Architecture Letters10.1109/L-CA.2008.77:2(41-44)Online publication date: 1-Jul-2008
https://dl.acm.org/doi/10.1109/L-CA.2008.7
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents