research-article

Compresso: pragmatic main memory compression

Authors:

Alaa R. AlameldeenAuthors Info & Claims

MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture

Pages 546 - 558

https://doi.org/10.1109/MICRO.2018.00051

Published: 20 October 2018 Publication History

Abstract

Today, larger memory capacity and higher memory bandwidth are required for better performance and energy efficiency for many important client and datacenter applications. Hardware memory compression provides a promising direction to achieve this without increasing system cost. Unfortunately, current memory compression solutions face two significant challenges. First, keeping memory compressed requires additional memory accesses, sometimes on the critical path, which can cause performance overheads. Second, they require changing the operating system to take advantage of the increased capacity, and to handle incompressible data, which delays deployment. We propose Compresso, a hardware memory compression architecture that minimizes memory overheads due to compression, with no changes to the OS. We identify new data-movement trade-offs and propose optimizations that reduce additional memory movement to improve system efficiency. We propose a holistic evaluation for compressed systems. Our results show that Compresso achieves a 1.85x compression for main memory on average, with a 24% speedup over a competitive hardware compressed system for single-core systems and 27% for multi-core systems. As compared to competitive compressed systems, Compresso not only reduces performance overhead of compression, but also increases performance gain from higher memory capacity.

References

[1]

J. Dean and L. A. Barroso, "The Tail at Scale," Communications of the ACM, vol. 56, pp. 74--80, 2013.

Digital Library

[2]

M. E. Haque, Y. hun Eom, Y. He, S. Elnikety, R. Bianchini, and K. S. McKinley, "Few-to-Many: Incremental Parallelism for Reducing Tail Latency in Interactive Services," in ASPLOS, 2015, pp. 161--175.

Digital Library

[3]

M. Jeon, Y. He, S. Elnikety, A. L. Cox, and S. Rixner, "Adaptive Parallelism for Web Search," in Eurosys, 2013, pp. 155--168.

Digital Library

[4]

G. Pekhimenko, V. Seshadri, Y. Kim, H. Xin, O. Mutlu, P. Gibbons, M. Kozuch, and T. Mowry, "Linearly compressed pages: a low-complexity, low-latency main memory compression framework," in Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture, 2013, pp. 172--184.

Digital Library

[5]

R. G. Y. Zhang, "Enabling Partial Cache Line Prefetching Through Data Compression," in ICPP, 2000, pp. 277--285.

[6]

J. Kim, M. Sullivan, E. Choukse, and M. Erez, "Bit-Plane Compression: Transforming Data for Better Compression in Many-Core Architectures," in Proceedings of the 43rd Annual International Symposium on Computer Architecture, 2016, pp. 329--340.

Digital Library

[7]

J. Yang, Y. Zhang, and R. Gupta, "Frequent value compression in data caches," in Proceedings of the 33rd Annual IEEE/ACM International Symposium on Microarchitecture, 2000, pp. 258--265.

Digital Library

[8]

R. B. Tremaine, P. A. Franaszek, J. T. Robinson, C. O. Schulz, T. B. Smith, M. Wazlowski, and P. M. Bland, "IBM Memory Expansion Technology (MXT)," in IBM Journal of Research and Development, vol. 45, No. 2, 2001, pp. 271--285.

Digital Library

[9]

M. Ekman and P. Stenstrom, "A Robust Main-Memory Compression Scheme," in Proceedings of the 32nd Annual International Symposium on Computer Architecture, 2005, pp. 74--85.

Digital Library

[10]

J. Zhao, S. Li, J. Chang, J. L. Byrne, L. L. Ramirez, K. Lim, Y. Xie, and P. Faraboschi, "Buri: Scaling big-memory computing with hardware-based memory expansion," ACM Trans. Archit. Code Optim., vol. 12, no. 3, pp. 31:1--31:24, 2015.

Digital Library

[11]

S. Kim, S. Lee, T. Kim, and J. Huh, "Transparent dual memory compression architecture," in Proceedings of the 26th International Conference on Parallel Architectures and Compilation Techniques, 2017, pp. 206--218.

[12]

C. Qian, L. Huang, Q. Yu, Z. Wang, and B. Childers, "CMH: Compression management for improving capacity in the hybrid memory cube," in Proceedings of the 15th ACM International Conference on Computing Frontiers, 2018.

Digital Library

[13]

A. R. Alameldeen and D. A. Wood, "Frequent Pattern Compression: A significance-based compression scheme for L2 caches," Technical Report 1500, Computer Sciences Department, University of Wisconsin-Madison, Tech. Rep., 2004.

[14]

J. Ziv and A. Lempel, "A universal algorithm for sequential data compression," IEEE Transactions on Information Theory, vol. 23, no. 3, pp. 337--343, 1977.

Digital Library

[15]

X. Chen, L. Yang, R. Dick, L. Shang, and H. Lekatsa, "C-PACK: a high-performance microprocessor cache compression algorithm," in IEEE Educational Activities Department vol. 18, 2010, pp. 1196--1208.

Digital Library

[16]

G. Pekhimenko, V. Seshadri, O. Mutlu, P. Gibbons, M. Kozuch, and T. Mowry, "Base-delta-immediate compression: practical data compression for on-chip caches," in Proceedings of the 21st International Conference on Parallel Architectures and Compilation Techniques, 2012, pp. 377--388.

Digital Library

[17]

A. Arelakis, F. Dahlgren, and P. Stenstrom, "HyComp: A Hybrid Cache Compression Method for Selection of Data-type-specific Compression Methods," in Proceedings of the 48th International Symposium on Microarchitecture. New York, NY, USA: ACM, 2015, pp. 38--49.

Digital Library

[18]

A. Arelakis and P. Stenstrom, "SC2: A Statistical Compression Cache Scheme," in Proceeding of the 41st Annual International Symposium on Computer Architecture. Piscataway, NJ, USA: IEEE Press, 2014, pp. 145--156.

Digital Library

[19]

S. Sardashti and D. A. Wood, "Could compression be of general use? evaluating memory compression across domains," ACM Trans. Archit. Code Optim., vol. 14, no. 4, pp. 44:1--44:24, Dec. 2017.

Digital Library

[20]

C. A. Waldspurger, "Memory resource management in VMware ESX server," in Proceedings of the 5th symposium on Operating systems design and implementation (OSDI), 2002, pp. 181--194.

Digital Library

[21]

P. A. Franaszek and D. E. Poff, "Management of Guest OS Memory Compression In Virtualized Systems," Patent US20080307, 2007.

[22]

T. Chen, "Introduction to acpi-based memory hot-plug," in LinuxCon/CloudOpen Japan, 2013.

[23]

E. Choukse, M. Erez, and A. R. Alameldeen, "CompressPoints: An Evaluation Methodology for Compressed Memory Systems," in IEEE Computer Architecture Letters, vol. PP, no. 99, 2018, pp. 1--1.

[24]

E. Perelman, G. Hamerly, M. Biesbrouck, T. Sherwood, and B. Calder, "Using SimPoint for accurate and efficient simulation," in Proceedings of the International Joint Conference on Measurement and Modeling of Computer Systems (SIGMETRICS), 2003, pp. 318--319.

Digital Library

[25]

J. Leskovec and R. Sosič, "SNAP: A General-Purpose Network Analysis and Graph-Mining Library," ACM Transactions on Intelligent Systems and Technology (TIST), vol. 8, p. 1, 2016.

Digital Library

[26]

D. Sanchez and C. Kozyrakis, "ZSim: fast and accurate microarchitectural simulation of thousand-core systems," in Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013, pp. 475--486.

Digital Library

[27]

S. Li, J. H. Ahn, R. D. Strong, J. B. Brockman, D. M. Tullsen, and N. P. Jouppi, "McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures," in Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, 2009, pp. 469--480.

Digital Library

[28]

S. J. E. Wilton and N. P. Jouppi, "CACTI: An Enhanced Cache Access and Cycle Time Model," IEEE Journal of Solid-State Circuits, vol. 31, pp. 677--688, 1996.

[29]

G. Pekhimenko, E. Bolotin, N. Vijaykumar, O. Mutlu, T. C. Mowry, and S. W. Keckler, "A case for toggle-aware compression for GPU systems," International Symposium on High Performance Computer Architecture (HPCA), 2016.

[30]

H. Seol, W. Shin, J. Jang, J. Choi, J. Suh, and L.-S. Kim, "Energy Efficient Data Encoding in DRAM Channels Exploiting Data Value Similarity," in Proceedings of the 43rd International Symposium on Computer Architecture, 2016.

Digital Library

[31]

S. Ghose, A. G. Yaglikçi, R. Gupta, D. Lee, K. Kudrolli, W. X. Liu, H. Hassan, K. K. Chang, N. Chatterjee, A. Agrawal, M. O'Connor, and O. Mutlu, "What Your DRAM Power Models Are Not Telling You: Lessons from a Detailed Experimental Study," in SIGMETRICS, 2018.

Digital Library

[32]

Taiwan Semiconductor Manufacturing Company, "40nm CMOS Standard Cell Library v120b," 2009.

[33]

"Apple Releases Developer Preview of OS X Mavericks With More Than 200 New Features," 2013. {Online}. Available: https://www.apple.com/pr/library/2013/06/10Apple-Releases-Developer-Preview-of-OS-X-Mavericks-With-More-Than-200-New-Features.html

[34]

"Announcing Windows 10 insider preview," 2015. {Online}. Available: https://blogs.windows.com/windowsexperience/2015/08/18/announcing-windows-10-insider-preview-build-10525/#36QzdLwDd3Eb45ol.97

[35]

E. G. Hallnor and S. K. Reinhardt, "A unified compressed memory hierarchy," in Proceedings of the 11th International Symposium on High-Performance Computer Architecture, 2005, pp. 201--212.

Digital Library

[36]

A. R. Alameldeen and D. A. Wood, "Adaptive cache compression for high-performance processors," in Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004, pp. 212-.

Digital Library

[37]

A. Shafiee, M. Taassori, R. Balasubramonian, and A. Davis, "MemZip: Exploring unconventional benefits from memory compression," in International Symposium on High Performance Computer Architecture (HPCA), 2014, pp. 638--649.

[38]

D. J. Palframan, N. S. Kim, and M. H. Lipasti, "COP: To Compress and Protect Main Memory," in Proceedings of the 42Nd Annual International Symposium on Computer Architecture. New York, NY, USA: ACM, 2015, pp. 682--693.

Digital Library

[39]

S. Sardashti, A. Seznec, and D. A. Wood, "Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache," ACM Trans. Archit. Code Optim., pp. 27:1--27:25, 2016.

Digital Library

[40]

J. Kim, M. Sullivan, S.-L. Gong, and M. Erez, "Frugal ECC: Efficient and Versatile Memory Error Protection Through Fine-grained Compression," in Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2015, pp. 12:1--12:12.

Digital Library

[41]

"Qualcomm Centriq 2400 Processor," 2017. {Online}. Available: https://www.qualcomm.com/media/documents/files/qualcomm-centriq-2400-processor.pdf

[42]

S. Chaudhry, R. Cypher, M. Ekman, M. Karlsson, A. Landin, S. Yip, H. Zeffer, and M. Tremblay, "Rock: A high-performance Sparc CMT processor," Proceedings of the 42Nd Annual IEEE/ACM International Symposium on Microarchitecture, vol. 29, no. 2, pp. 6--16, March 2009.

Digital Library

[43]

R. de Castro, A. Lago, and M. Silva, "Adaptive compressed caching: Design and implementation," in Computer Architecture and High Performance Computing, 2003. Proceedings. 15th Symposium on, 2003.

Digital Library

[44]

H. Alam, T. Zhang, M. Erez, and Y. Etsion, "Do-It-Yourself Virtual Memory Translation," in Proceedings of the 44th Annual International Symposium on Computer Architecture. New York, NY, USA: ACM, 2017, pp. 457--468.

Digital Library

[45]

P. Franaszek, J. Robinson, and J. Thomas, "Parallel compression with cooperative dictionary construction," in Proceedings of the Data Compression Conference, 1996, pp. 200--209.

Digital Library

[46]

B. Pham, J. Veselý, G. H. Loh, and A. Bhattacharjee, "Large pages and lightweight memory management in virtualized environments: Can you have it both ways?" in Proceedings of the 48th International Symposium on Microarchitecture. New York, NY, USA: ACM, 2015, pp. 1--12.

Digital Library

[47]

L. Villa, M. Zhang, and K. Asanović, "Dynamic zero compression for cache energy reduction," in Proceedings of the 33rd Annual ACM/IEEE International Symposium on Microarchitecture, 2000, pp. 214--220.

Digital Library

[48]

A. Arelakis and P. Stenstrom, "A case for a value-aware cache," IEEE Comput. Archit. Lett., vol. 13, no. 1, pp. 1--4, Jan. 2014.

Digital Library

Cited By

Das Sharma DBlankenship RBerger D(2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3669900
Ma LXie RZhang TBlackburn SPetrank E(2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595273
Eldstål-Ahrens AArelakis ASourdis I(2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3481641
Show More Cited By

Compresso: pragmatic main memory compression
1. General and reference
  1. Cross-computing tools and techniques

Recommendations

A Novel Memory Block Management Scheme for PCM Using WOM-Code
HPCC-CSS-ICESS '15: Proceedings of the 2015 IEEE 17th International Conference on High Performance Computing and Communications, 2015 IEEE 7th International Symposium on Cyberspace Safety and Security, and 2015 IEEE 12th International Conf on Embedded Software and Systems

Phase Change Memory (PCM) is a promising DRAM replacement in embedded systems due to its attractive characteristics including low static power consumption and high density. However, long write latency is one of the major drawbacks in current PCM ...
WOM-Code Solutions for Low Latency and High Endurance in Phase Change Memory
This paper describes a write-once-memory-code phase change memory (WOM-code PCM) architecture for next-generation non-volatile memory applications. Specifically, we address the long latency of the write operation in PCM—attributed to PCM SET—...
A workload-aware flash translation layer enhancing performance and lifespan of TLC/SLC dual-mode flash memory in embedded systems

Similar to traditional NAND flash memory, triple-level cell (TLC) flash memory is used as secondary storage to meet the fast growing demands on storage capacity. TLC flash memory exhibits attractive features such as shock resistance, high density, low ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-51: Proceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture

October 2018

1015 pages

ISBN:9781538662403

General Chairs:
Mark Oskin
University of Washington
,
Koji Inoue
Kyushu University

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing

Publisher

IEEE Press

Publication History

Published: 20 October 2018

Check for updates

Qualifiers

Research-article

Conference

MICRO-51

Sponsor:

SIGMICRO

MICRO-51: The 51st Annual IEEE/ACM International Symposium on Microarchitecture

October 20 - 24, 2018

Fukuoka, Japan

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

9
Total Citations
View Citations
102
Total Downloads

Downloads (Last 12 months)11
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Das Sharma DBlankenship RBerger D(2024)An Introduction to the Compute Express Link (CXL) InterconnectACM Computing Surveys10.1145/366990056:11(1-37)Online publication date: 8-Jul-2024
https://dl.acm.org/doi/10.1145/3669900
Ma LXie RZhang TBlackburn SPetrank E(2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595273
Eldstål-Ahrens AArelakis ASourdis I(2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3481641
Panwar GLaghari MBears DLiu YJearls CChoukse ECameron KButt AJian XHardavellas NCampanoni SGrot BKarpuzcu U(2022)Translation-Optimized Memory Compression for CapacityProceedings of the 55th Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO56248.2022.00073(992-1011)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/MICRO56248.2022.00073
Park SKang IMoon YAhn JSuh GSherwood TBerger EKozyrakis C(2021)BCD deduplication: effective memory compression using partial cache-line deduplicationProceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3445814.3446722(52-64)Online publication date: 19-Apr-2021
https://dl.acm.org/doi/10.1145/3445814.3446722
Tsai PSanchez AFletcher CSanchez DLarus JCeze LStrauss K(2020)Safecracker: Leaking Secrets through Compressed CachesProceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3373376.3378453(1125-1140)Online publication date: 9-Mar-2020
https://dl.acm.org/doi/10.1145/3373376.3378453
Eldstål-Damlin ATrancoso PSourdis I(2019)AVRProceedings of the 48th International Conference on Parallel Processing10.1145/3337821.3337824(1-10)Online publication date: 5-Aug-2019
https://dl.acm.org/doi/10.1145/3337821.3337824
Liu SSeemakhupt KPekhimenko GKolli AKhan SManne SHunter HAltman E(2019)JanusProceedings of the 46th International Symposium on Computer Architecture10.1145/3307650.3322206(143-156)Online publication date: 22-Jun-2019
https://dl.acm.org/doi/10.1145/3307650.3322206
Tsai PSanchez DBahar IHerlihy MWitchel ELebeck A(2019)Compress Objects, Not Cache LinesProceedings of the Twenty-Fourth International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3297858.3304006(229-242)Online publication date: 4-Apr-2019
https://dl.acm.org/doi/10.1145/3297858.3304006

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents