research-article

Heterogeneously tagged caches for low-power embedded systems with virtual memory support

Authors:

Xiangrong Zhou,

Peter PetrovAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 13, Issue 2

Article No.: 32, Pages 1 - 24

https://doi.org/10.1145/1344418.1344428

Published: 23 April 2008 Publication History

Get Access

Abstract

An energy-efficient data cache organization for embedded processors with virtual memory is proposed. Application knowledge regarding memory references is used to eliminate most tag translations. A novel tagging scheme is introduced, where both virtual and physical tags coexist. Physical tags and special handling of superset index bits are only used for references to shared regions in order to avoid cache inconsistency. By eliminating the need for most address translations on cache access, a significant power reduction is achieved. We outline an efficient hardware architecture, where the application information is captured in a reprogrammable way and the cache is minimally modified.

References

[1]

ARM, Ltd. 1995. ARM920T Technical Reference Manual. ARM, Ltd.

Google Scholar

[2]

Austin, T., Larson, E., and Ernst, D. 2002. Simplescalar: An infrastructure for computer system modeling. IEEE Comput. 35, 2 (Feb.), 59--67.

Digital Library

Google Scholar

[3]

Benini, L., Macii, A., and Poncino, M. 2003. Energy-Aware design of embedded memories: A survey of technologies, architectures, and optimization techniques. ACM Trans. Embed. Comput. Syst. 2, 1, 5--32.

Digital Library

Google Scholar

[4]

Benini, L., Menichelli, F., and Olivieri, M. 2004. A class of code compression schemes for reducing power consumption in embedded microprocessor systems. IEEE Trans. Comput. 53, 4, 467--482.

Digital Library

Google Scholar

[5]

Calder, B., Krintz, C., John, S., and Austin, T. 1998. Cache-Conscious data placement. In Proceedings of the 8th International Conference on Architectural Support for Programming Languages and Operating Systems, 139--149.

Digital Library

Google Scholar

[6]

Cekleov, M. and Dubois, M. 1997. Virtual-Address caches. Part 1: Problems and solutions in uniprocessors. IEEE Micro. 17, 5 (Sept.), 64--71.

Digital Library

Google Scholar

[7]

Chilimbi, T. M., Hill, M. D., and Larus, J. R. 1999. Cache-Conscious structure layout. In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI), 1--12.

Digital Library

Google Scholar

[8]

Ekman, M., Dahlgren, F., and Stenstrom, P. 2002. TLB and snoop energy-reduction using virtual caches in low-power chip-microprocessors. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED). 243--246.

Digital Library

Google Scholar

[9]

Intel Corp. 2007. Intel XScale microarchitecture. Intel Corporation.

Google Scholar

[10]

Jacob, B. and Mudge, T. 1998. Virtual memory: Issues of implementation. IEEE Comput. 31, 6 (Jun.), 33--43.

Digital Library

Google Scholar

[11]

Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISLPED), 196--201.

Digital Library

Google Scholar

[12]

Kadayif, I., Nath, P., Kandemir, M., and Sivasubramaniam, A. 2004. Compiler-Directed physical address generation for reducing DTLB power. In Proceedings of the International Symposium on Performance Analysis of Systems and Software (ISPASS), 161--168.

Digital Library

Google Scholar

[13]

Kadayif, I., Sivasubramaniam, A., Kandemir, M., Kandiraju, G., and Chen, G. 2002. Generating physical addresses directly for saving instruction TLB energy. In Proceedings of the International Symposium on Microarchitecture (MICRO), 185.

Digital Library

Google Scholar

[14]

Kandemir, M., Kadayif, I., and Chen, G. 2004. Compiler-Directed code restructuring for reducing data TLB energy. In Proceedings of the International Conference on Hardware/Software Codedesign and System Synthesis (CODES and ISSS), 98--103.

Digital Library

Google Scholar

[15]

Kim, J., Min, S., Jeon, S., Ahn, B., Jeong, D., and Kim, C. 1995. U-Cache: A cost-effective solution to synonym problem. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 243--252.

Digital Library

Google Scholar

[16]

Kulkarni, C., Ghez, C., Miranda, M., Catthoor, F., and Man, H. D. 2005. Cache conscious data layout organization for conflict miss reduction in embedded multimedia applications. IEEE Trans. Comput. 54, 1, 76--81.

Digital Library

Google Scholar

[17]

Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. Mediabench: A tool for evaluating and synthesizing multimedia and communications systems. In Proceedings of the International Symposium on Microarchitecture (MICRO), 330--335.

Digital Library

Google Scholar

[18]

Lee, J. H., Lee, J. S., Jeong, S., and Kim, S. 2001. A banked-promotion TLB for high performance and low power. In Proceedings of the IEEE International Conference on Computer Design (ICCD), 118--123.

Digital Library

Google Scholar

[19]

Middha, B., Simpson, M., and Barua, R. 2005. MTSS: Multi task stack sharing for embedded systems. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 191--201.

Digital Library

Google Scholar

[20]

Panda, P. R., Catthoor, F., Dutt, N. D., Danckaert, K., Brockmeyer, E., Kulkarni, C., Vandercappelle, A., and Kjeldsberg, P. G. 2001. Data and memory optimization techniques for embedded systems. ACM Trans. Des. Autom. Electron. Syst. 6, 2, 149--206.

Digital Library

Google Scholar

[21]

Petrov, P., Tracy, D., and Orailoglu, A. 2005. Energy-Efficient physically tagged caches for embedded processors with virtual memory. In Proceedings of the IEEE/ACM Design Automation Conference (DAC), 17--22.

Digital Library

Google Scholar

[22]

Qiu, X. and Dubois, M. 2001. Towards virtually-addressed memory hierarchies. In Proceedings of the International Symposium on High-Performance Computer Archtecture (HPCA), 51--62.

Digital Library

Google Scholar

[23]

Simpson, M., Middha, B., and Barua, R. 2005. Segment protection for embedded systems using run-time checks. In Proceedings of the International Conference on Compilers, Architecture and Synthesis for Embedded Systems (CASES), New York, 66--77.

Digital Library

Google Scholar

[24]

Tarjan, D., Thoziyoor, S., and Jouppi, N. 2006. Cacti 4.0: An integrated cache timing, power and area model. Tech. Rep., HP Laboratories, Palo Alto, California, June.

Google Scholar

[25]

Vratonjic, M., Zeydel, B., and Oklobdzija, V. 2005. Low- and ultra low-power arithmetic units: Design and comparison. In Proceedings of the International Conference on Computer Design (ICCD), 249--252.

Digital Library

Google Scholar

[26]

Woo, D., Ghosh, M., Ozer, E., Biles, S., and Lee, H.-H. 2006. Reducing energy of virtual cache synonym lookup using bloom filters. In Proceedings of the International Conference on Compilers, Architectures and Synthesis for Embedded Systems (CASES), 179--189.

Digital Library

Google Scholar

Cited By

View all

Hsia AChen CLiu T(2016)Energy-efficient synonym data detection and consistency for virtual cacheMicroprocessors & Microsystems10.1016/j.micpro.2015.11.00440:C(27-44)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1016/j.micpro.2015.11.004
Bardizbanyan AGavin PWhalley DSjalander MLarsson-Edefors PMcKee SStenstrom P(2013)Improving data access efficiency by using a tagless access buffer (TAB)Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495003(1-11)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1109/CGO.2013.6495003
Basu AHill MSwift MLu STorrellas J(2012)Reducing memory reference energy with opportunistic virtual cachingProceedings of the 39th Annual International Symposium on Computer Architecture10.5555/2337159.2337194(297-308)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.5555/2337159.2337194
Show More Cited By

Index Terms

Heterogeneously tagged caches for low-power embedded systems with virtual memory support
1. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches
ISLPED '07: Proceedings of the 2007 international symposium on Low power electronics and design

Chip multiprocessors (CMPs) emerge as a dominant architectural alternative in high-end embedded systems. Since off-chip accesses require a long latency and consume a large amount of power, CMPs are typically based on multiple levels of on-chip cache ...
Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory
This paper presents a low-power tag organization for physically tagged caches in embedded processors with virtual memory support. An exceedingly small subset of tag bits is identified for each application hot-spot so that only these tag bits are used for ...
Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited ...

Reviews

Reviewer: Gabriel Mateescu

Zhou and Petrov propose a cache architecture that aims to provide fast data access and low power consumption. The memory hierarchy of modern computer systems includes a high-speed memory that is faster but smaller than the main memory, called cache memory. Caches reduce the average latency of memory accesses, and can be organized in multiple levels, where the size increases but the speed decreases with the level. The processor accesses the cache in two steps: cache indexing and tag comparison. In cache indexing, the least significant bits of the memory address are used to select a cache set, where a set consists of one (for direct-mapped caches) or several (for set-associative caches) cache lines. Each cache line consists of data, tags, and state bits, with the tags containing the memory address. During tag comparison, the tags of all the lines in the selected set are compared against the memory address. If a match is found, a cache hit occurs and the data from the cache is used. In systems with virtual memory, the processor issues virtual addresses that are translated into physical addresses using a combination of software and hardware. The virtual address space is divided into virtual pages, and the physical space is divided into page frames. A special structure in memory, called the page table, translates virtual page numbers to physical page numbers for each process, and a special cache called the translation lookaside buffer (TLB) caches the page table entries: "TLB is usually implemented as a highly associative cache structure which consumes a significant amount of power." The memory address used for indexing and for tagging the cache can be either the virtual or the physical address. If both addresses are physical addresses, the cache architecture is called physical cache. Otherwise, it is called virtual cache. The most common kinds of virtual caches are those indexed and tagged with virtual address bits (V/V caches), and those indexed with virtual bits and tagged with physical bits (V/P caches). Physical caches require that address translation be performed before cache indexing for each memory access. For this purpose, the TLB is accessed, which incurs both a performance penalty (because it inserts the TLB in the memory access path) and a power overhead (because of the power consumption of the TLB). In contrast, V/V caches have the advantage that cache accessing does not require address translation (thus no TLB access), which results in fast access and low power consumption. However, V/V caches have the drawback of potential cache consistency problems. These problems can occur when the virtual-to-physical page mapping is changed by the operating system, or when multiple processes share some physical memory (that is, parts of the virtual address spaces of two processes are mapped to the same physical memory). The following kinds of cache consistency problems can occur: synonyms, aliases, homonyms, and cache coherence. (Cekleov and Dubois define these cache consistency problems in their paper [1].) In uniprocessor systems, cache coherence problems can occur when synonyms for shared writable data exist. Since information in instruction caches is not modified by processes, V/V caches can be safely used for instruction caches. The homonym problem is solved by extending the virtual tags with the process ID of the process that issues the virtual address. In V/P caches, cache indexing proceeds in parallel with address translation, hiding some of the address translation latency. Tag comparison occurs when both indexing and address translation are complete. V/P caches consume more power than the V/V caches, are slower than V/V caches, and are faster than physical caches. However, V/P caches have the advantage over V/V caches in that cache consistency problems can be easily avoided. Therefore, V/P caches can be safely used for data caches. The cache architecture proposed by the authors tries to combine the low power consumption and fast access of V/V caches with the elimination of consistency problems provided by V/P caches. The authors introduce a hybrid tagging scheme that uses virtual tags for private data and physical tags for shared data, employing application-specific information in order to decide which kind of tag to use for a certain virtual page. Shared pages are identified using a combination of source-code annotation, compiler support, and additional hardware. The source code of the application declares shared data using #pragma directives. A portion of the virtual address space is reserved for shared data and the compiler maps data declared as shared to that reserved space. Finally, combinational logic (for example, a three-input and gate if the reserved address space is identified by ones in the most significant three address bits) is used to detect shared pages. The merits of the proposed technique are questionable. First, the requirement to annotate the applications implies that existing applications need to be modified. Second, applying the techniques requires compiler support and changes to the processor logic. The presentation style of the authors is not very clear, some ideas are repeatedly stated using slightly different phrasing, and some technical mistakes are made by the authors. For example, the authors incorrectly define cache aliasing as "a situation where the same virtual address from different tasks is mapped to different physical addresses." In fact, this defines homonyms [1]. Online Computing Reviews Service

Access critical reviews of Computing literature here

Become a reviewer for Computing Reviews.

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 13, Issue 2

April 2008

272 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/1344418

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 23 April 2008

Accepted: 01 December 2007

Received: 01 April 2007

Published in TODAES Volume 13, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tag

Embedded systems

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

12
Total Citations
View Citations
457
Total Downloads

Downloads (Last 12 months)3
Downloads (Last 6 weeks)0

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

View all

Hsia AChen CLiu T(2016)Energy-efficient synonym data detection and consistency for virtual cacheMicroprocessors & Microsystems10.1016/j.micpro.2015.11.00440:C(27-44)Online publication date: 1-Feb-2016
https://dl.acm.org/doi/10.1016/j.micpro.2015.11.004
Bardizbanyan AGavin PWhalley DSjalander MLarsson-Edefors PMcKee SStenstrom P(2013)Improving data access efficiency by using a tagless access buffer (TAB)Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO)10.1109/CGO.2013.6495003(1-11)Online publication date: 23-Feb-2013
https://dl.acm.org/doi/10.1109/CGO.2013.6495003
Basu AHill MSwift MLu STorrellas J(2012)Reducing memory reference energy with opportunistic virtual cachingProceedings of the 39th Annual International Symposium on Computer Architecture10.5555/2337159.2337194(297-308)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.5555/2337159.2337194
Basu AHill MSwift M(2012)Reducing memory reference energy with opportunistic virtual cachingACM SIGARCH Computer Architecture News10.1145/2366231.233719440:3(297-308)Online publication date: 9-Jun-2012
https://dl.acm.org/doi/10.1145/2366231.2337194
Basu AHill MSwift M(2012)Reducing memory reference energy with opportunistic virtual caching2012 39th Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA.2012.6237026(297-308)Online publication date: Jun-2012
https://doi.org/10.1109/ISCA.2012.6237026
Yu BDong SMa YLin TWang YChen SGoto S(2011)Network flow-based simultaneous retiming and slack budgeting for low power designProceedings of the 16th Asia and South Pacific Design Automation Conference10.5555/1950815.1950913(473-478)Online publication date: 25-Jan-2011
https://dl.acm.org/doi/10.5555/1950815.1950913
Zheng LDong MOta KJin HGuo SMa J(2011)Energy Efficiency of a Multi-Core Processor by Tag ReductionJournal of Computer Science and Technology10.1007/s11390-011-1149-026:3(491-503)Online publication date: 12-May-2011
https://doi.org/10.1007/s11390-011-1149-0
Zheng LDong MJin HGuo MGuo STu X(2010)The core degree based tag reduction on chip multiprocessor to balance energy saving and performance overheadProceedings of the 2010 IFIP international conference on Network and parallel computing10.5555/1882011.1882047(358-372)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.5555/1882011.1882047
Zheng LDong MOta KLi HGuo SGuo M(2010)Exploring the Limits of Tag Reduction for Energy Saving on a Multi-core ProcessorProceedings of the 2010 39th International Conference on Parallel Processing Workshops10.1109/ICPPW.2010.26(104-112)Online publication date: 13-Sep-2010
https://dl.acm.org/doi/10.1109/ICPPW.2010.26
Zheng LDong MJin HGuo MGuo STu X(2010)The Core Degree Based Tag Reduction on Chip Multiprocessor to Balance Energy Saving and Performance OverheadNetwork and Parallel Computing10.1007/978-3-642-15672-4_30(358-372)Online publication date: 2010
https://doi.org/10.1007/978-3-642-15672-4_30
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Abstract

References

Cited By

Index Terms

Recommendations

A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches

Dynamic Tag Reduction for Low-Power Caches in Embedded Systems with Virtual Memory

Design and Optimization of Large Size and Low Overhead Off-Chip Caches

Reviews

Access critical reviews of Computing literature here

Comments

Information

Published In

Publisher

Journal Family

Publication History

Permissions

Check for updates

Author Tag

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Login options

Full Access

View options

PDF

eReader

Share

Share this Publication link

Share on social media

Affiliations