Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Tiny instruction caches for low power embedded systems

Published: 01 November 2003 Publication History

Abstract

Instruction caches have traditionally been used to improve software performance. Recently, several tiny instruction cache designs, including filter caches and dynamic loop caches, have been proposed to instead reduce software power. We propose several new tiny instruction cache designs, including preloaded loop caches, and one-level and two-level hybrid dynamic/preloaded loop caches. We evaluate the existing and proposed designs on embedded system software benchmarks from both the Powerstone and MediaBench suites, on two different processor architectures, for a variety of different technologies. We show on average that filter caching achieves the best instruction fetch energy reductions of 60--80%, but at the cost of about 20% performance degradation, which could also affect overall energy savings. We show that dynamic loop caching gives good instruction fetch energy savings of about 30%, but that if a designer is able to profile a program, preloaded loop caching can more than double the savings. We describe automated methods for quickly determining the best loop cache configuration, methods useful in a core-based design flow.

References

[1]
Aghaghiri, Y., Fallah, F., and Pedram, M. 2001. Irredundant address bus encoding for low power. International Symposium on Low Power Electronics and Design, 82--87.
[2]
Albonesi, D. H. 2000. Selective cache ways: On-demand cache resource allocation. Journal of Instruction Level Parallelism.
[3]
Artisan. http://www.artisan.com.
[4]
Bajwa, R. S., Hiraki, M., Kojima, H., Gorney, D., Nitta, K., Shridhar, A., Seki, K., and Sasaki, K. 1997. Instruction buffering to reduce power in processors for signal processing. IEEE Transactions on VLSI Systems, 417--424.
[5]
Bellas, N., Hajj, I., Polychronopoulos, C., and Stamoulis, G. 1999. Energy and performance improvements in microprocessor design using a loop cache. In International Conference on Computer Design, 378--383.
[6]
Benini, L., Demicheli, G., Macii, E., Sciuto, D., and Silvano, C. 1998. Address bus encoding techniques for system-level power optimization. In Design Automation and Test in Europe.
[7]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In 27th Annual International Symposium on Computer Architecture.
[8]
Burger, D., Austin, T., and Bennet, S. 1996. Evaluating future microprocessors: the simplescalar toolset. University of Wisconsin-Madison. Computer Science Department. Tech. Report CS-TR-1308.
[9]
Compaq Western Research Labs n.d. CACTI 3.0, http://research.compaq.com/wrl/people/jouppi/CACTI.html.
[10]
Cotterell, S. and Vahid, F. 2002. Synthesis of customized loop caches for core-based embedded systems. In International Conference on Computer Aided Design.
[11]
Edler, J. and Hill, M. D. n.d. Dinero IV trace-driven uniprocessor cache simulator, http://www.cs.wisc.edu/∼markhill/DineroIV
[12]
Govindarajan, S. C., Ramaswamy, G., and Mehendale, M. 2001. Area and power reduction of embedded DSP systems using instruction compression and re-configurable encoding. In International Conference on Computer Aided Design.
[13]
Hasegawa, A., Kawasaki, I., Yamda, K., Yoshioka, S., Kawasaki, S., and Biswas, P. 1995. SH3 high codes density, low power. IEEE Micro.
[14]
The International Technology Roadmap for Semiconductors (ITRS). Semiconductor Industry Association. 1999.
[15]
Ishihara, Y. and Yasuura, H. 2000. A power reduction technique with object code merging for application specific embedded processors. In Design Automation and Test in Europe.
[16]
Jouppi, N. 1990. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. In International Symposium on Computer Architecture, 364--373.
[17]
Kin, J., Gupta, M., and Mangione-Smith, W. 1997. The filter cache: An energy efficient memory structure. In International Symposium on Microarchitecture, 184--193.
[18]
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings 30th Annual International Symposium on Microarchitecture.
[19]
Lee, L. H., Moyer, B., and Arends. 1999a. Low-cost embedded program loop caching---revisited. U. Mich. Technical Report Number CSE-TR-411-99.
[20]
Lee, L. H., Moyer, B., and Arends, J. 1999b. Instruction fetch energy reduction using loop caches for embedded applications with small tight loops. In International Symposium On Low Power Electronics and Design.
[21]
Malik, A., Moyer, B., and Cermak, D. 2000. A low power unified cache architecture providing power and performance flexibility. In International Symposium on Low Power Electronics and Design.
[22]
Moyer, B., Lee, L. H., and Arends, J. 1999. Data processing system having a cache and method thereof, US Patent number 5, 893, 142.
[23]
Segars, S. 2001. Low power design techniques for microprocessors. In IEEE International Solid-State Circuits Conference.
[24]
Sias, J. W., Hunter, H. C., and Hwu, W. W. 2001. Enhancing loop buffering of media and telecommunications applications using low-overhead predication, In Proc. of the 34th International Symposium on Microarchitecture.
[25]
Stan, M. R. and Burleson, W. P. 1995. Bus-invert coding for low-power I/O. IEEE Transactions on Very Large Scale Integration Systems 3, 1, 49--58.
[26]
Synopsys Inc. n.d. http://www.synopsys.com.
[27]
Sugumar, R. and Abraham, S. 1991. Efficient simulation of multiple cache configurations using binomial trees. Technical Report CSE-TR-111-91, CSE Division, University of Michigan.
[28]
Villarreal, J., Lysecky, R., Cotterell, S., and Vahid, F. n.d. Loop analysis of embedded applications. UC Riverside CS&E Technical Report UCR-CSE-01-03.
[29]
Villarreal, J., Suresh, D., Stitt, G., Vahid, F., and NAjjar, W. 2002. Improving software performance with configurable logic. Design Automation of Embedded Systems.
[30]
Wu, Z. and Wolf, W. 1999. Iterative cache simulation of embedded CPUs with trace stripping. In International Conference on Hardware/Software Co-Design.

Cited By

View all
  • (2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
  • (2018)A Thread-Saving Schedule with Graph Analysis for Parallel Deep Learning Applications on Embedded Systems2018 IEEE International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud.2018.00026(111-115)Online publication date: Sep-2018
  • (2015)Optimized cryptographic algorithm for embedded systems2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)10.1109/ICATCCT.2015.7456850(33-38)Online publication date: Oct-2015
  • Show More Cited By

Index Terms

  1. Tiny instruction caches for low power embedded systems

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 2, Issue 4
    November 2003
    165 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/950162
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 01 November 2003
    Published in TECS Volume 2, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Loop cache
    2. architecture tuning
    3. embedded systems.
    4. filter cache
    5. fixed program
    6. instruction cache
    7. low energy
    8. low power

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)4
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 10 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2020)Energy Efficient On-Demand Dynamic Branch Prediction ModelsIEEE Transactions on Computers10.1109/TC.2019.295671069:3(453-465)Online publication date: 1-Mar-2020
    • (2018)A Thread-Saving Schedule with Graph Analysis for Parallel Deep Learning Applications on Embedded Systems2018 IEEE International Conference on Smart Cloud (SmartCloud)10.1109/SmartCloud.2018.00026(111-115)Online publication date: Sep-2018
    • (2015)Optimized cryptographic algorithm for embedded systems2015 International Conference on Applied and Theoretical Computing and Communication Technology (iCATccT)10.1109/ICATCCT.2015.7456850(33-38)Online publication date: Oct-2015
    • (2014)A survey of architectural techniques for improving cache power efficiencySustainable Computing: Informatics and Systems10.1016/j.suscom.2013.11.0014:1(33-43)Online publication date: Mar-2014
    • (2012)Scratchpad memory-global power optimizationInternational Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012)10.1109/ICPRIME.2012.6208343(199-203)Online publication date: Mar-2012
    • (2010)Fine-grain dynamic instruction placement for L0 scratch-pad memoryProceedings of the 2010 international conference on Compilers, architectures and synthesis for embedded systems10.1145/1878921.1878943(137-146)Online publication date: 24-Oct-2010
    • (2009)HitMEProceedings of the 2009 Asia and South Pacific Design Automation Conference10.5555/1509633.1509720(335-340)Online publication date: 19-Jan-2009
    • (2009)HitME: Low power Hit MEmory buffer for embedded systems2009 Asia and South Pacific Design Automation Conference10.1109/ASPDAC.2009.4796503(335-340)Online publication date: Jan-2009
    • (2009)Compiler Support for Code Size Reduction Using a Queue-Based ProcessorTransactions on High-Performance Embedded Architectures and Compilers II10.1007/978-3-642-00904-4_14(269-285)Online publication date: 22-Apr-2009
    • (2008)Hierarchical Instruction Register OrganizationIEEE Computer Architecture Letters10.1109/L-CA.2008.77:2(41-44)Online publication date: 1-Jul-2008
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media