research-article

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Authors:

Gennady Pekhimenko,

Vivek Seshadri,

Phillip B. Gibbons,

Michael A. Kozuch,

Todd C. MowryAuthors Info & Claims

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

Pages 172 - 184

https://doi.org/10.1145/2540708.2540724

Published: 07 December 2013 Publication History

Abstract

Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient.

By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression--Linearly Compressed Pages (LCP)--that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression.

Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core workloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).

References

[1]

B. Abali et al. Memory Expansion Technology (MXT): Software Support and Performance. IBM J. Res. Dev., 2001.

Digital Library

[2]

A. R. Alameldeen and D. A. Wood. Adaptive Cache Compression for High-Performance Processors. In ISCA-31, 2004.

Digital Library

[3]

A. R. Alameldeen and D. A. Wood. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Tech. Rep., 2004.

[4]

E. D. Berger. Memory Management for High-Performance Applications. PhD thesis, 2002.

Digital Library

[5]

X. Chen et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI Systems, 2010.

Digital Library

[6]

E. Cooper-Balis, P. Rosenfeld, and B. Jacob. Buffer-On-Board Memory Systems. In ISCA, 2012.

Digital Library

[7]

R. S. de Castro, A. P. do Lago, and D. Da Silva. Adaptive Compressed Caching: Design and Implementation. In SBAC-PAD, 2003.

Digital Library

[8]

F. Douglis. The Compression Cache: Using On-line Compression to Extend Physical Memory. In Winter USENIX Conference, 1993.

[9]

J. Dusser et al. Zero-Content Augmented Caches. In ICS, 2009.

Digital Library

[10]

M. Ekman and P. Stenström. A Robust Main-Memory Compression Scheme. In ISCA-32, 2005.

Digital Library

[11]

M. Farrens and A. Park. Dynamic Base Register Caching: A Technique for Reducing Address Bus Width. In ISCA, 1991.

Digital Library

[12]

E. G. Hallnor and S. K. Reinhardt. A Unified Compressed Memory Hierarchy. In HPCA-11, 2005.

Digital Library

[13]

D. Huffman. A Method for the Construction of Minimum-Redundancy Codes. IRE, 1952.

[14]

S. Iacobovici et al. Effective Stream-Based and Execution-Based Data Prefetching. In ICS, 2004.

Digital Library

[15]

Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, 2013.

[16]

JEDEC. GDDR3 Specific SGRAM Functions, JESD21-C, 2012.

[17]

U. Kang et al. 8Gb 3D DDR3 DRAM Using Through-Silicon-Via Technology. In ISSCC, 2009.

[18]

S. F. Kaplan. Compressed Caching and Modern Virtual Memory Simulation. PhD thesis, 1999.

Digital Library

[19]

C. Lefurgy et al. Energy Management for Commercial Servers. In IEEE Computer, 2003.

Digital Library

[20]

C. Li, C. Ding, and K. Shen. Quantifying the Cost of Context Switch. In ExpCS, 2007.

Digital Library

[21]

S. Li et al. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO-42, 2009.

Digital Library

[22]

P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 2002.

Digital Library

[23]

Micron. 2Gb: x4, x8, x16, DDR3 SDRAM, 2012.

[24]

H. Patil et al. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37, 2004.

Digital Library

[25]

G. Pekhimenko et al. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In PACT, 2012.

Digital Library

[26]

G. Pekhimenko et al. Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency. In SAFARI Technical Report No. 2012--002, 2012.

Digital Library

[27]

V. Sathish, M. J. Schulte, and N. S. Kim. Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads. In PACT, 2012.

Digital Library

[28]

A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In ASPLOS-9, 2000.

Digital Library

[29]

SPEC CPU2006. http://www.spec.org/.

[30]

S. Srinath et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In HPCA-13, 2007.

Digital Library

[31]

S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.

[32]

M. Thuresson et al. Memory-Link Compression Schemes: A Value Locality Perspective. IEEE TC, 2008.

Digital Library

[33]

Transaction Processing Performance Council. http://www.tpc.org/.

[34]

R. B. Tremaine et al. Pinnacle: IBM MXT in a Memory Controller Chip. IEEE Micro, 2001.

Digital Library

[35]

P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The Case for Compressed Caching in Virtual Memory Systems. In USENIX Annual Technical Conference, 1999.

Digital Library

[36]

J. Yang, R. Gupta, and C. Zhang. Frequent Value Encoding for Low Power Data Buses. ACM TODAES, 2004.

Digital Library

[37]

J. Yang, Y. Zhang, and R. Gupta. Frequent Value Compression in Data Caches. In MICRO-33, 2000.

Digital Library

[38]

D. H. Yoon, M. K. Jeong, M. Sullivan, and M. Erez. The Dynamic Granularity Memory System. In ISCA, 2012.

Digital Library

[39]

Y. Zhang, J. Yang, and R. Gupta. Frequent Value Locality and Value-Centric Data Cache Design. In ASPLOS-9, 2000.

Digital Library

[40]

J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE TIT, 1977.

Digital Library

Cited By

Surchenko ANedbailo Y(2024)Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control PolicyМетодика компрессии данных в накристальных и межпроцессорных сетях с широкими каналами и политикой управления потоком wormholeInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.823:3(859-885)Online publication date: 28-May-2024
https://doi.org/10.15622/ia.23.3.8
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Gao CXu XYang ZLin LLi J(2023)QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator IntegrationApplied Sciences10.3390/app13181052613:18(10526)Online publication date: 21-Sep-2023
https://doi.org/10.3390/app131810526
Show More Cited By

Index Terms

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Recommendations

Coordinating DRAM and Last-Level-Cache Policies with the Virtual Write Queue

To alleviate bottlenecks in this era of many-core architectures, the authors propose a virtual write queue to expand the memory controller's scheduling window through visibility of cache behavior. Awareness of the physical main memory layout and a focus ...
Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache
MICRO-47: Proceedings of the 47th Annual IEEE/ACM International Symposium on Microarchitecture

Recent research advocates large die-stacked DRAM caches in many core servers to break the memory latency and bandwidth wall. To realize their full potential, die-stacked DRAM caches necessitate low lookup latencies, high hit rates and the efficient use ...
Challenges of High-Capacity DRAM Stacks and Potential Directions
MCHPC'18: Proceedings of the Workshop on Memory Centric High Performance Computing

With rapid growth in data volumes and an increase in number of CPU/GPU cores per chip, the capacity and bandwidth of main memory can be scaled up to accommodate performance requirements of data-intensive applications. Recent 3D-stacked in-package memory ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture

December 2013

498 pages

ISBN:9781450326384

DOI:10.1145/2540708

General Chair:
Matthew Farrens
UC Davis
,
Program Chair:
Christos Kozyrakis
Stanford University

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2013

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

MICRO-46

Sponsor:

SIGMICRO

MICRO-46: The 46th Annual IEEE/ACM International Symposium on Microarchitecture

December 7 - 11, 2013

California, Davis

Acceptance Rates

MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

94
Total Citations
View Citations
1,070
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)12

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Surchenko ANedbailo Y(2024)Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control PolicyМетодика компрессии данных в накристальных и межпроцессорных сетях с широкими каналами и политикой управления потоком wormholeInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.823:3(859-885)Online publication date: 28-May-2024
https://doi.org/10.15622/ia.23.3.8
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Gao CXu XYang ZLin LLi J(2023)QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator IntegrationApplied Sciences10.3390/app13181052613:18(10526)Online publication date: 21-Sep-2023
https://doi.org/10.3390/app131810526
Ma LXie RZhang TBlackburn SPetrank E(2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
https://dl.acm.org/doi/10.1145/3591195.3595273
Song LChen FLi HChen YMohror KArnold DBadia R(2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
https://dl.acm.org/doi/10.1145/3581784.3607077
Zhang DYu J(2023)Application of online data migration model and ID3 algorithm in sports competition data miningInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02171-0Online publication date: 27-Sep-2023
https://doi.org/10.1007/s13198-023-02171-0
Kim BKim YNair PHong S(2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
https://doi.org/10.3390/electronics11020240
Eldstål-Ahrens AArelakis ASourdis I(2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.1145/3481641
Schwedock BYoovidhya PSeibert JBeckmann NSalapura VZahran MChong FTang L(2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
https://dl.acm.org/doi/10.1145/3470496.3527379
Fu YLu YChen wu ZWu YXiao N(2022)Design and Simulation of Content-Aware Hybrid DRAM-PCM Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312353933:7(1666-1677)Online publication date: 1-Jul-2022
https://doi.org/10.1109/TPDS.2021.3123539
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents