Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1109/MICRO.2018.00051acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Compresso: pragmatic main memory compression

Published: 20 October 2018 Publication History

Abstract

Today, larger memory capacity and higher memory bandwidth are required for better performance and energy efficiency for many important client and datacenter applications. Hardware memory compression provides a promising direction to achieve this without increasing system cost. Unfortunately, current memory compression solutions face two significant challenges. First, keeping memory compressed requires additional memory accesses, sometimes on the critical path, which can cause performance overheads. Second, they require changing the operating system to take advantage of the increased capacity, and to handle incompressible data, which delays deployment. We propose Compresso, a hardware memory compression architecture that minimizes memory overheads due to compression, with no changes to the OS. We identify new data-movement trade-offs and propose optimizations that reduce additional memory movement to improve system efficiency. We propose a holistic evaluation for compressed systems. Our results show that Compresso achieves a 1.85x compression for main memory on average, with a 24% speedup over a competitive hardware compressed system for single-core systems and 27% for multi-core systems. As compared to competitive compressed systems, Compresso not only reduces performance overhead of compression, but also increases performance gain from higher memory capacity.

References

[1]
J. Dean and L. A. Barroso, "The Tail at Scale," Communications of the ACM, vol. 56, pp. 74--80, 2013.
[2]
M. E. Haque, Y. hun Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley, "Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services," in ASPLOS, 2015, pp. 161--175.
[3]
M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner, "Adaptive Parallelism for Web Search," in Eurosys, 2013, pp. 155--168.
[4]
G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. Gibbons, M. Kozuch, and T. Mowry, "Linearly compressed pages: a low-complexity, low-latency main memory compression framework," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013, pp. 172--184.
[5]
R. G. Y. Zhang, "Enabling Partial Cache Line Prefetching Through Data Compression," in ICPP, 2000, pp. 277--285.
[6]
J. Kim, M. Sullivan, E. Choukse, and M. Erez, "Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures," in Proceedings of the 43rd Annual International Symposium on Computer Architecture, 2016, pp. 329--340.
[7]
J. Yang, Y. Zhang, and R. Gupta, "Frequent value compression in data caches," in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000, pp. 258--265.
[8]
R. B. Tremaine, P. A. Franaszek, J. T. Robinson, C. O. Schulz, T. B. Smith, M. Wazlowski, and P. M. Bland, "IBM Memory Expansion Technology (MXT)," in IBM Journal of Research and Development, vol. 45, No. 2, 2001, pp. 271--285.
[9]
M. Ekman and P. Stenstrom, "A Robust Main-Memory Compression Scheme," in Proceedings of the 32nd Annual International Symposium on Computer Architecture, 2005, pp. 74--85.
[10]
J. Zhao, S. Li, J. Chang, J. L. Byrne, L. L. Ramirez, K. Lim, Y. Xie, and P. Faraboschi, "Buri: Scaling big-memory computing with hardware-based memory expansion," ACM Trans. Archit. Code Optim., vol. 12, no. 3, pp. 31:1--31:24, 2015.
[11]
S. Kim, S. Lee, T. Kim, and J. Huh, "Transparent dual memory compression architecture," in Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017, pp. 206--218.
[12]
C. Qian, L. Huang, Q. Yu, Z. Wang, and B. Childers, "CMH: Compression management for improving capacity in the hybrid memory cube," in Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018.
[13]
A. R. Alameldeen and D. A. Wood, "Frequent Pattern Compression: A significance-based compression scheme for L2 caches," Technical Report 1500, Computer Sciences Department, University of Wisconsin-Madison, Tech. Rep., 2004.
[14]
J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337--343, 1977.
[15]
X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsa, "C-PACK: a high-performance microprocessor cache compression algorithm," in IEEE Educational Activities Department vol. 18, 2010, pp. 1196--1208.
[16]
G. Pekhimenko, V. Seshadri, O. Mutlu, P. Gibbons, M. Kozuch, and T. Mowry, "Base-delta-immediate compression: practical data compression for on-chip caches," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, 2012, pp. 377--388.
[17]
A. Arelakis, F. Dahlgren, and P. Stenstrom, "HyComp: A Hybrid Cache Compression Method for Selection of Data-type-specific Compression Methods," in Proceedings of the 48th International Symposium on Microarchitecture. New York, NY, USA: ACM, 2015, pp. 38--49.
[18]
A. Arelakis and P. Stenstrom, "SC2: A Statistical Compression Cache Scheme," in Proceeding of the 41st Annual International Symposium on Computer Architecture. Piscataway, NJ, USA: IEEE Press, 2014, pp. 145--156.
[19]
S. Sardashti and D. A. Wood, "Could compression be of general use? evaluating memory compression across domains," ACM Trans. Archit. Code Optim., vol. 14, no. 4, pp. 44:1--44:24, Dec. 2017.
[20]
C. A. Waldspurger, "Memory resource management in VMware ESX server," in Proceedings of the 5th symposium on Operating systems design and implementation (OSDI), 2002, pp. 181--194.
[21]
P. A. Franaszek and D. E. Poff, "Management of Guest OS Memory Compression In Virtualized Systems," Patent US20080307, 2007.
[22]
T. Chen, "Introduction to acpi-based memory hot-plug," in LinuxCon/CloudOpen Japan, 2013.
[23]
E. Choukse, M. Erez, and A. R. Alameldeen, "CompressPoints: An Evaluation Methodology for Compressed Memory Systems," in IEEE Computer Architecture Letters, vol. PP, no. 99, 2018, pp. 1--1.
[24]
E. Perelman, G. Hamerly, M. Biesbrouck, T. Sherwood, and B. Calder, "Using SimPoint for accurate and efficient simulation," in Proceedings of the International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003, pp. 318--319.
[25]
J. Leskovec and R. Sosič, "SNAP: A General-Purpose Network Analysis and Graph-Mining Library," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, p. 1, 2016.
[26]
D. Sanchez and C. Kozyrakis, "ZSim: fast and accurate microarchitectural simulation of thousand-core systems," in Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013, pp. 475--486.
[27]
S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, pp. 469--480.
[28]
S. J. E. Wilton and N. P. Jouppi, "CACTI: An Enhanced Cache Access and Cycle Time Model," IEEE Journal of Solid-State Circuits, vol. 31, pp. 677--688, 1996.
[29]
G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler, "A case for toggle-aware compression for GPU systems," International Symposium on High Performance Computer Architecture (HPCA), 2016.
[30]
H. Seol, W. Shin, J. Jang, J. Choi, J. Suh, and L.-S. Kim, "Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity," in Proceedings of the 43rd International Symposium on Computer Architecture, 2016.
[31]
S. Ghose, A. G. Yaglikçi, R. Gupta, D. Lee, K. Kudrolli, W. X. Liu, H. Hassan, K. K. Chang, N. Chatterjee, A. Agrawal, M. O'Connor, and O. Mutlu, "What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study," in SIGMETRICS, 2018.
[32]
Taiwan Semiconductor Manufacturing Company, "40nm CMOS Standard Cell Library v120b," 2009.
[33]
"Apple Releases Developer Preview of OS X Mavericks With More Than 200 New Features," 2013. {Online}. Available: https://www.apple.com/pr/library/2013/06/10Apple-Releases-Developer-Preview-of-OS-X-Mavericks-With-More-Than-200-New-Features.html
[34]
"Announcing Windows 10 insider preview," 2015. {Online}. Available: https://blogs.windows.com/windowsexperience/2015/08/18/announcing-windows-10-insider-preview-build-10525/#36QzdLwDd3Eb45ol.97
[35]
E. G. Hallnor and S. K. Reinhardt, "A unified compressed memory hierarchy," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005, pp. 201--212.
[36]
A. R. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004, pp. 212-.
[37]
A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis, "MemZip: Exploring unconventional benefits from memory compression," in International Symposium on High Performance Computer Architecture (HPCA), 2014, pp. 638--649.
[38]
D. J. Palframan, N. S. Kim, and M. H. Lipasti, "COP: To Compress and Protect Main Memory," in Proceedings of the 42Nd Annual International Symposium on Computer Architecture. New York, NY, USA: ACM, 2015, pp. 682--693.
[39]
S. Sardashti, A. Seznec, and D. A. Wood, "Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache," ACM Trans. Archit. Code Optim., pp. 27:1--27:25, 2016.
[40]
J. Kim, M. Sullivan, S.-L. Gong, and M. Erez, "Frugal ECC: Efficient and Versatile Memory Error Protection Through Fine-grained Compression," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015, pp. 12:1--12:12.
[41]
"Qualcomm Centriq 2400 Processor," 2017. {Online}. Available: https://www.qualcomm.com/media/documents/files/qualcomm-centriq-2400-processor.pdf
[42]
S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin, S. Yip, H. Zeffer, and M. Tremblay, "Rock: A high-performance Sparc CMT processor," Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, vol. 29, no. 2, pp. 6--16, March 2009.
[43]
R. de Castro, A. Lago, and M. Silva, "Adaptive compressed caching: Design and implementation," in Computer Architecture and High Performance Computing, 2003. Proceedings. 15th Symposium on, 2003.
[44]
H. Alam, T. Zhang, M. Erez, and Y. Etsion, "Do-It-Yourself Virtual Memory Translation," in Proceedings of the 44th Annual International Symposium on Computer Architecture. New York, NY, USA: ACM, 2017, pp. 457--468.
[45]
P. Franaszek, J. Robinson, and J. Thomas, "Parallel compression with cooperative dictionary construction," in Proceedings of the Data Compression Conference, 1996, pp. 200--209.
[46]
B. Pham, J. Veselý, G. H. Loh, and A. Bhattacharjee, "Large pages and lightweight memory management in virtualized environments: Can you have it both ways?" in Proceedings of the 48th International Symposium on Microarchitecture. New York, NY, USA: ACM, 2015, pp. 1--12.
[47]
L. Villa, M. Zhang, and K. Asanović, "Dynamic zero compression for cache energy reduction," in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, 2000, pp. 214--220.
[48]
A. Arelakis and P. Stenstrom, "A case for a value-aware cache," IEEE Comput. Archit. Lett., vol. 13, no. 1, pp. 1--4, Jan. 2014.

Cited By

View all
  • (2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
  • (2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • Show More Cited By
  1. Compresso: pragmatic main memory compression

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture
    October 2018
    1015 pages
    ISBN:9781538662403

    Sponsors

    Publisher

    IEEE Press

    Publication History

    Published: 20 October 2018

    Check for updates

    Qualifiers

    • Research-article

    Conference

    MICRO-51
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 484 of 2,242 submissions, 22%

    Upcoming Conference

    MICRO '24

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)11
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 15 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
    • (2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
    • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
    • (2022)Translation-Optimized Memory Compression for CapacityProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00073(992-1011)Online publication date: 1-Oct-2022
    • (2021)BCD deduplication: effective memory compression using partial cache-line deduplicationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446722(52-64)Online publication date: 19-Apr-2021
    • (2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
    • (2019)AVRProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337824(1-10)Online publication date: 5-Aug-2019
    • (2019)JanusProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322206(143-156)Online publication date: 22-Jun-2019
    • (2019)Compress Objects, Not Cache LinesProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304006(229-242)Online publication date: 4-Apr-2019

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media