Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

Published: 05 April 2017 Publication History

Abstract

Data compression plays a pivotal role in improving system performance and reducing energy consumption, because it increases the logical effective capacity of a compressed memory system without physically increasing the memory size. However, data compression techniques incur some cost, such as non-negligible compression and decompression overhead. This overhead becomes more severe if compression is used in the cache. In this article, we aim to minimize the read-hit decompression penalty in compressed Last-Level Caches (LLCs) by speculatively decompressing frequently used cachelines. To this end, we propose a Hot-cacheline Prediction and Early decompression (HoPE) mechanism that consists of three synergistic techniques: Hot-cacheline Prediction (HP), Early Decompression (ED), and Hit-history-based Insertion (HBI). HP and HBI efficiently identify the hot compressed cachelines, while ED selectively decompresses hot cachelines, based on their size information. Unlike previous approaches, the HoPE framework considers the performance balance/tradeoff between the increased effective cache capacity and the decompression penalty. To evaluate the effectiveness of the proposed HoPE mechanism, we run extensive simulations on memory traces obtained from multi-threaded benchmarks running on a full-system simulation framework. We observe significant performance improvements over compressed cache schemes employing the conventional Least-Recently Used (LRU) replacement policy, the Dynamic Re-Reference Interval Prediction (DRRIP) scheme, and the Effective Capacity Maximizer (ECM) compressed cache management mechanism. Specifically, HoPE exhibits system performance improvements of approximately 11%, on average, over LRU, 8% over DRRIP, and 7% over ECM by reducing the read-hit decompression penalty by around 65%, over a wide range of applications.

References

[1]
Bulent Abali, Hubertus Franke, Xiaowei Shen, Dan E. Poff, and T. Basil Smith. 2001. Performance of hardware compressed main memory. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture (HPCA’01). 73--81.
[2]
Ali-Reza Adl-Tabatabai, Anwar M. Ghuloum, and Shobhit O. Kanaujia. 2007. Compression in cache design. In Proceedings of the 21st Annual International Conference on Supercomputing (ICS’07). 190--201.
[3]
Alaa R. Alameldeen and David A. Wood. 2004a. Adaptive cache compression for high-performance processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture (ISCA’04). 12--223.
[4]
Alaa R. Alameldeen and David A. Wood. 2004b. Frequent pattern compression: A significance-based compression scheme for L2 caches. In Technical Report 1500. Computer Sciences Department, University of Wisconsin—Madison.
[5]
Apple. 2015. Apple OS X yosemite, advanced technologies. Retrieved June 2015 from http://www.apple.com/osx/advanced-technologies/.
[6]
Angelos Arelakis and Per Stenstrom. 2014. SC2: A statistical compression cache scheme. In Proceeding of the 41st Annual International Symposium on Computer Architecture (ISCA’14). 145--156.
[7]
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, and Jongman Kim. 2014. Designing hybrid DRAM/PCM main memory systems utilizing dual-phase compression. ACM Trans. Des. Autom. Electron. Syst. 20, 1, Article 11 (Nov. 2014).
[8]
Seungcheol Baek, Hyung Gyu Lee, Chrysostomos Nicopoulos, Junghee Lee, and Jongman Kim. 2015. Size-aware cache management for compressed cache architectures. In IEEE Trans. Comput. 64. 2337--2352.
[9]
Christian Bienia and Kai Li. 2009. PARSEC 2.0: A new benchmark suite for chip-multiprocessors. In Proceedings of the 5th Annual Workshop on Modeling, Benchmarking and Simulation.
[10]
Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. 2010. C-pack: A high-performance microprocessor cache compression algorithm. IEEE Trans. VLSI 18, 8 (Aug. 2010), 1196--1208.
[11]
Krupal Chikhale and Urmila Shrawankar. 2014. Hybrid multi-level cache management policy. In Proceedings of the 4th International Conference on Communication Systems and Network Technologies (CSNT’14). 1119--1123.
[12]
Ju Hee Choi, Jong Wook Kwak, Seong Tae Jhang, and Chu Shik Jhon. 2014. Adaptive cache compression for non-volatile memories in embedded system. In Proceedings of the 2014 Conference on Research in Adaptive and Convergent Systems (RACS’14). 52--57.
[13]
P. Franaszek, J. Robinson, and J. Thomas. 1996. Parallel compression with cooperative dictionary construction. In Proceedings of the Conference on Data Compression (DCC’96). 200--209.
[14]
E. G. Hallnor and S. K. Reinhardt. 2004. A compressed memory hierarchy using an indirect index cache. In Proceedings of the 3rd Workshop on Memory Performance Issues: In conjunction with the 31st International Symposium on Computer Architecture (WMPI’04). 9--15.
[15]
E. G. Hallnor and S. K. Reinhardt. 2005. A unified compressed memory hierarchy. In Proceedings of the 11th International Symposium on High-Performance Computer Architecture (HPCA’05). 201--212.
[16]
Hewlett-Packard. CACTI-6.5. Retrieved from http://www.hpl.hp.com/research/cacti/.
[17]
Aamer Jaleel, William Hasenplaugh, Moinuddin Qureshi, Julien Sebot, Simon Steely, Jr., and Joel Emer. 2008. Adaptive insertion policies for managing shared caches. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques (PACT’08). 208--219.
[18]
Aamer Jaleel, Kevin B. Theobald, Simon C. Steely, Jr., and Joel Emer. 2010. High performance cache replacement using re-reference interval prediction (RRIP). In Proceedings of the 37th Annual International Symposium on Computer Architecture (ISCA’10). 60--71.
[19]
Soontae Kim, Jongmin Lee, Jesung Kim, and Seokin Hong. 2011. Residue cache: A low-energy low-area L2 cache architecture via compression and partial hits. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 420--429.
[20]
Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. 2000. An on-chip cache compression technique to reduce decompression overhead and design complexity. J. Syst. Arch. 46, 15 (Dec. 2000), pp. 1365--1382.
[21]
Haiming Liu, Michael Ferdman, Jaehyuk Huh, and Doug Burger. 2008. Cache bursts: A new approach for eliminating dead blocks and increasing cache efficiency. In Proceedings of the 41st Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’08). 222--233.
[22]
Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hållberg, Johan Högberg, Fredrik Larsson, Andreas Moestedt, and Bengt Werner. 2002. Simics: A full system simulation platform. IEEE Comput. 35, 2 (Oct. 2002), 50--58.
[23]
Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, Michael R. Marty, Min Xu, Alaa R. Alameldeen, Kevin E. Moore, Mark D. Hill, and David A. Wood. 2005. Multifacet’s general execution-driven multiprocessor simulator (GEMS) toolset. SIGARCH Comput. Arch. News 33, 4 (2005), 92--99.
[24]
Micron. 2013. Datasheet of DDR3 SDRAM UDIMM, MT8JTF12864AZ, MT8JTF25664AZ, MT8JFT51264AZ.
[25]
Gennady Pekhimenko, Tyler Huberty, Rui Cai, Onur Mutlu, Phillip B Gibbons, Michael Kozuch, Todd C Mowry, and others. 2015. Exploiting compressed block size as an indicator of future reuse. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). 51--63.
[26]
Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques. ACM, 377--388.
[27]
Moinuddin K. Qureshi, Aamer Jaleel, Yale N. Patt, Simon C. Steely, and Joel Emer. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th Annual International Symposium on Computer Architecture (ISCA’07). 381--391.
[28]
Somayeh Sardashti and David A. Wood. 2013. Decoupled compressed cache: Exploiting spatial locality for energy-optimized compressed caching. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’13). 2--73.
[29]
Luis Villa, Michael Zhang, and Krste Asanović. 2000. Dynamic zero compression for cache energy reduction. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 214--220.
[30]
Carole-Jean Wu, Aamer Jaleel, Will Hasenplaugh, Margaret Martonosi, Simon C. Steely, Jr., and Joel Emer. 2011. SHiP: Signature-based hit predictor for high performance caching. In Proceedings of the 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’11). 430--441.
[31]
Yuejian Xie and G. H. Loh. 2011. Thread-aware dynamic shared cache compression in multi-core processors. In Proceedings of the 29th IEEE International Conference on Computer Design (ICCD’11). 135--141.
[32]
Jun Yang, Youtao Zhang, and Rajiv Gupta. 2000. Frequent value compression in data caches. In Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture (MICRO’00). 258--265.

Cited By

View all
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2020)D-SOAP: Dynamic Spatial Orientation Affinity Prediction for Caching in Multi-Orientation Memory Systems2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00055(581-595)Online publication date: Oct-2020
  • (2017)MBZipACM Transactions on Architecture and Code Optimization10.1145/315103314:4(1-29)Online publication date: 5-Dec-2017

Index Terms

  1. HoPE: Hot-Cacheline Prediction for Dynamic Early Decompression in Compressed LLCs

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Transactions on Design Automation of Electronic Systems
      ACM Transactions on Design Automation of Electronic Systems  Volume 22, Issue 3
      July 2017
      440 pages
      ISSN:1084-4309
      EISSN:1557-7309
      DOI:10.1145/3062395
      • Editor:
      • Naehyuck Chang
      Issue’s Table of Contents
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Journal Family

      Publication History

      Published: 05 April 2017
      Accepted: 01 September 2016
      Revised: 01 August 2016
      Received: 01 December 2015
      Published in TODAES Volume 22, Issue 3

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. Cache
      2. cache management policy
      3. compression

      Qualifiers

      • Research-article
      • Research
      • Refereed

      Funding Sources

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)3
      • Downloads (Last 6 weeks)1
      Reflects downloads up to 14 Jan 2025

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
      • (2020)D-SOAP: Dynamic Spatial Orientation Affinity Prediction for Caching in Multi-Orientation Memory Systems2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO)10.1109/MICRO50266.2020.00055(581-595)Online publication date: Oct-2020
      • (2017)MBZipACM Transactions on Architecture and Code Optimization10.1145/315103314:4(1-29)Online publication date: 5-Dec-2017

      View Options

      Login options

      Full Access

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media