Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

Coupling compiler-enabled and conventional memory accessing for energy efficiency

Published: 01 May 2004 Publication History

Abstract

This article presents Cool-Mem, a family of memory system architectures that integrate conventional memory system mechanisms, energy-aware address translation, and compiler-enabled cache disambiguation techniques, to reduce energy consumption in general-purpose architectures. The solutions provided in this article leverage on interlayer tradeoffs between architecture, compiler, and operating system layers. Cool-Mem achieves power reduction by statically matching memory operations with energy-efficient cache and virtual memory access mechanisms. It combines statically speculative cache access modes, a dynamic content addressable memory-based (CAM-based) Tag-Cache used as backup for statically mispredicted accesses, different conventional multilevel associative cache organizations, embedded protection checking along all cache access mechanisms, as well as architectural organizations to reduce the power consumed by address translation in virtual memory. Because it is based on speculative static information, a superset of the predictable program information available at compile-time, our approach removes the burden of provable correctness in compiler analysis passes that extract static information. This makes Cool-Mem highly practical, applicable for large and complex applications, without having any limitations due to complexity issues in our compiler passes or the presence of precompiled static libraries. Based on extensive evaluation, for both SPEC2000 and Mediabench applications, we obtain from 6% to 19% total energy savings in the processor, with performance ranging from 1.5% degradation to 6% improvement, for the applications studied. We have also compared Cool-Mem to several prior arts and have found Cool-Mem to perform better in almost all cases.

References

[1]
Albonesi, D. H. 1999. Selective cache ways: On-demand cache resource allocation. In International Symposium on Microarchitecture.]]
[2]
Balasubramonian, R., Albonesi, D. H., Buyuktosunoglu, A., and Dwarkadas, S. 2000. Memory hierarchy reconfiguration for energy and performance in general-purpose processor architectures. In International Symposium on Microarchitecture.]]
[3]
Benini, L., Macii, A., and Poncino, M. 2000. A recursive algorithm for low-power memory partitioning. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '00, July).]]
[4]
Borkar, S., Ye, Y., and De, V. 1998. A technique for standby leakage reduction in high-performance circuits. In Symposium on VLSI Circuits. 40--41.]]
[5]
Brooks, D., Tiwari, V., and Martonosi, M. 2000. Wattch: A framework for architectural-level power analysis and optimizations. In Proceedings of the 27th International Symposium on Computer Architecture (ISCA '00, June).]]
[6]
Burger, D. C. and Austin, T. M. 1997. The SimpleScalar tool set, version 2.0. Tech. rep. CS-TR-1997--1342, University of Wisconsin-Madison, Madison, WI.]]
[7]
Chandrakasan, A. P., Bowhill, W., and Fox, F. (Eds.). 2000. Design of High-Performance Microprocessor Circuits. John Wiley & Sons, New York, NY.]]
[8]
Chase, J. S., Levy, H. M., Lazowska, E. D., and Baker-Harvey, M. 1992. Lightweight shared objects in a 64-bit operating system. Tech. rep. 92-03-09. University of Washington, Seattle, WA (March).]]
[9]
Chen, J. B., Borg, A., and Jouppi, N. P. 1992. A Simulation-based study of TLB performance. In Proceedings of the 19th International Symposium on Computer Architecture (ISCA '92, May).]]
[10]
Cheng, R. 1987. Virtual address cache in Unix. In Proceedings of the 1987 Summer Usenix Conference. 217--224.]]
[11]
Cheriton, D. R., Slavenberg, G. A., and Boyle, P. D. 1986. Software-controlled caches in the VMP multiprocessor. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]]
[12]
Cortadella, J. and Llaberia, J. M. 1992. Evaluation of A + B = T condition without carry propogation. IEEE Trans. Comput. 41, 11 (Nov.), 1484--1488.]]
[13]
Digital Equipment Corporation. 1997. 21164 Alpha Microprocessor Hardware Reference Manual. Digital Equipment Corporation, Maynard, MA.]]
[14]
Flautner, K., Kim, N. S., Martin, S., Blaauw, D., and Mudge, T. 2002. Drowsy caches: Simple techniques for reducing leakage Power. In International Symposium on Computer Architecture (May).]]
[15]
Goodman, J. and Woest, P. 1988. The Wisconsin multicube: A new large-scale cache-coherent multiprocessor. In Proceedings of the 15th International Symposium on Computer Architecture (ISCA '88, June).]]
[16]
Goodman, J. R. 1987. Coherency for multiprocessor virtual address caches. In Proceedings of the 2nd International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '87, Oct.).]]
[17]
Gowan, M. K., Biro, L. L., and Jackson, D. B. 1998. Power considerations in the design of the Alpha 21264 microprocessor. In Proceedings of the 35th Design Automation Conference (DAC '98).]]
[18]
Henning, J. L. 2000. SPEC CPU2000: Measuring CPU Performance in the New Millennium. In IEEE Comput. July, 28--35. Available online at http://www.specbench.org.]]
[19]
Hu, Z., Juang, P., Diodato, P., Kaxiras, S., Skadron, K., Martonosi, M., and Clark, D. 2002. Managing leakage for transient data: Decay and quasi-static 4T memory cells. In International Symposium on Low-Power Electronics and Design (Aug.).]]
[20]
Huang, M., Renau, J., Yoo, S.-M., and Torrellas, J. 2001. L1 data cache decomposition for energy efficiency. In Proceedings of the International Symposium on Low-Power Electronics and Design (ISPLED '01, Aug.).]]
[21]
Inoue, K., Ishihara, T., and Murakami, K. 1999. Way-Predicting set-associative cache for high performance and low energy consumption. In Proceedings of the International Symposium on Low-Power Electronic Design (ISPLED '99, Aug.).]]
[22]
Iyer, A. and Marculescu, D. 2001. Power aware microarchitecture resource scaling. In Proceedings of the IEEE Design, Automation and Test in Europe (DATE, March).]]
[23]
Jacob, B. L. and Mudge, T. N. 1997. Software-managed address translation. In Proceedings of the 3rd International Symposium on High Performance Computer Architecture (HPCA '97, Feb.).]]
[24]
Jacob, B. L. and Mudge, T. N. 2001. Uniprocessor virtual memory without TLBs. In IEEE Trans. Comput. 50, 5 (May), 482--499.]]
[25]
Juan, T., Lang, T., and Navarro, J. J. 1997. Reducing TLB power requirements. In Proceedings of the International Symposium on Low Power Electronics and Design (ISPLED '97, Aug.).]]
[26]
Kao, J. T. and Chandrakasan, A. P. 2000. Dual-threshold voltage techniques for low-power digital circuits. IEEE J. Solid-State Circ. 35, 7 (July), 1009--1018.]]
[27]
Kin, J., Gupta, M., and Smith, W. M. 1997. The Filter Cache: An energy efficient memory structure. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97, Dec.). IEEE Press, Los Alamitos, CA.]]
[28]
Kuroda, T. and Sakurai, T. 1996. Threshold-Voltage Control Schemes through Substrate-Bias for Low-Power High-Speed CMOS LSI Design. In J. VLSI Signal Process. Syst. 30, 2/3 (Aug.), 191--202.]]
[29]
Kuroda, T., Suzuki, K., Mira, S., Fujita, T., Yamane, F., Sano, F., Akihiko, C., Watanabe, Y., Yoshinori, M., Matsuda, K., Maeda, T., Sakurai, T., and Tohru, F. 1998. Variable supply-voltage scheme for low-power high-speed CMOS digital design. IEEE J. Solid-State Circ. 33, 3 (March), 454--462.]]
[30]
Lee, C., Potkonjak, M., and Mangione-Smith, W. H. 1997. MediaBench: A tool for evaluating and synthesizing multimedia and communication systems. In Proceedings of the 30th Annual Symposium on Microarchitecture (MICRO '97). IEEE Press, Los Alamitos, CA.]]
[31]
Ma, A., Zhang, M., and Asanovic, K. 2001. Way memoization to reduce fetch energy in instruction caches. In Workshop on Complexity Effective Design, 28th International Symposium on Computer Architecture (ISCA '01, July).]]
[32]
Montanaro, J. et al. 1997. A 160-MHz, 32-b, 0.5-W CMOS RISC microprocessor. In Digital Tech. J. 9, 1, 49--62.]]
[33]
Moritz, C. A., Frank, M., and Amarasinghe, S. 2001. FlexCache: A framework for compiler generated data caching. In Intelligent Memory Systems: Second International Workshop (IMS 2000), Cambridge, MA, November, 12, 2000, Revised Papers, F. T. Chong, C. E Kozyrakis, and M. Oskin, Eds. Lecture Notes in Computer Science, vol. 2107. Springer-Verlag, Heidelberg, Germany, 135--146.]]
[34]
Moritz, C. A., Frank, M., Lee, W., and Amarasinghe, S. 1999. Hot Pages: Software caching for raw microprocessors. MIT-LCS Tech. Memo LCS-TM-599. MIT, Cambridge, MA.]]
[35]
Mutoh, S., Douseki, T., Aoki, Y. M. T., Shingematsu, S., and Yamada, J. 1995. 1-V power supply high-speed digital circuit technology with multi-threshold CMOS technology. IEEE J. Solid-State Circ. 30, 8 (Aug.), 847--854.]]
[36]
Patterson, D. A. and Hennessy, J. L. 1990. Computer Architecture: A Quantitative Approach. Morgan Kaufmann, San Mateo, CA.]]
[37]
Powell, M., Yang, S., Falsafi, B., Roy, K., and Vijaykumar, T. 2000. Gated-Vdd: A circuit technique to reduce leakage in deep-submicron cache memories. In Proceedings of ISLPED.]]
[38]
Powell, M. D., Agarwal, A., Vijaykumar, T. N., Falsafi, B., and Roy, K. 2001. Reducing set-associative cache energy via way-prediction and selective direct-mapping. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]]
[39]
Reinman, G. and Jouppi, N. 2000. An integrated cache timing and power model. Compaq WRL Res. rep. 2000/70 Compaq Computer Corporation Western Research Laboratory, Palo Alto, CA.]]
[40]
Sair, S. and Charney, M. 2000. Memory behaviour of the SPEC2000 benchmark suite. IBM T. J. Watson Research Center technical report. IBM T. J. Watson Research Center, Yorktown Heights, NY.]]
[41]
Scott, M. L., LeBlanc, T. J., and Marsh, B. D. 1988. Design rationale for Psyche, a general-purpose multiprocessor operating system. In Proceedings of the 1988 International Conference on Parallel Processing.]]
[42]
Shigematsu, S. et al. 1997. A 1-V high-speed MTCMOS circuit scheme for power-down application circuits. IEEE J. Solid-State Circ. 32, 6 (June), 861--869.]]
[43]
Smith, A. J. 1982. Cache memories. Comput. Surv. 14, 3 (Sept.), 473--530.]]
[44]
Unsal, O. S., Ashok, R., Koren, I., Krishna, C. M., and Moritz, C. A. 2001. Cool-cache for hot multimedia. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]]
[45]
Unsal, O. S., Koren, I., Krishna, C. M., and Moritz, C. A. 2002. Cool-Fetch: Compiler-enabled power-aware fetch throttling. In IEEE Comput. Architect. Lett. 1.]]
[46]
Villa, L., Zhang, M., and Asanovic, K. 2000. Dynamic zero compression for cache energy reduction. In International Symposium on Microarchitecture.]]
[47]
Wang, W.-H., Baer, J.-L., and Levy, H. M. 1989. Organization and performance of a two-level virtual-real cache hierarchy. In Proceedings of the 16th International Symposium on Computer Architecture (ISCA '89, June).]]
[48]
Wheeler, B. and Bershad, B. N. 1992. Consistency management for virtually indexed caches. In Proceedings of the 5th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '92, Oct.).]]
[49]
Witchel, E., Larsen, S., Ananian, C. S., and Asanovic, K. 2001. Direct addressed caches for reduced power consumption. In 34th Annual Symposium on Microarchitecture (MICRO '01, Dec.). IEEE Press, Los Alamitos, CA.]]
[50]
Wood, D. A., Eggers, S. J., Gibson, G., Hill, M. D., Pendleton, J. M., Ritchie, S. A., Taylor, G. S., Katz, R. H., and Patterson, D. A. 1986. An in-cache address translation mechanism. In Proceedings of the 13th International Symposium on Computer Architecture (ISCA '86, Jan.).]]
[51]
Zhang, M. and Asanovic, K. 2000. Highly-associative caches for low-power processors. In Kool Chips Workshop, 33rd Annual Symposium on Microarchitecture (MICRO '00, Dec.).]]

Index Terms

  1. Coupling compiler-enabled and conventional memory accessing for energy efficiency

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Computer Systems
    ACM Transactions on Computer Systems  Volume 22, Issue 2
    May 2004
    144 pages
    ISSN:0734-2071
    EISSN:1557-7333
    DOI:10.1145/986533
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 May 2004
    Published in TOCS Volume 22, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Energy efficiency
    2. translation buffers
    3. virtually addressed caches

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 940
      Total Downloads
    • Downloads (Last 12 months)6
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media