Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2540708.2540724acmconferencesArticle/Chapter ViewAbstractPublication PagesmicroConference Proceedingsconference-collections
research-article

Linearly compressed pages: a low-complexity, low-latency main memory compression framework

Published: 07 December 2013 Publication History

Abstract

Data compression is a promising approach for meeting the increasing memory capacity demands expected in future systems. Unfortunately, existing compression algorithms do not translate well when directly applied to main memory because they require the memory controller to perform non-trivial computation to locate a cache line within a compressed memory page, thereby increasing access latency and degrading system performance. Prior proposals for addressing this performance degradation problem are either costly or energy inefficient.
By leveraging the key insight that all cache lines within a page should be compressed to the same size, this paper proposes a new approach to main memory compression--Linearly Compressed Pages (LCP)--that avoids the performance degradation problem without requiring costly or energy-inefficient hardware. We show that any compression algorithm can be adapted to fit the requirements of LCP, and we specifically adapt two previously-proposed compression algorithms to LCP: Frequent Pattern Compression and Base-Delta-Immediate Compression.
Evaluations using benchmarks from SPEC CPU2006 and five server benchmarks show that our approach can significantly increase the effective memory capacity (by 69% on average). In addition to the capacity gains, we evaluate the benefit of transferring consecutive compressed cache lines between the memory controller and main memory. Our new mechanism considerably reduces the memory bandwidth requirements of most of the evaluated benchmarks (by 24% on average), and improves overall performance (by 6.1%/13.9%/10.7% for single-/two-/four-core workloads on average) compared to a baseline system that does not employ main memory compression. LCP also decreases energy consumed by the main memory subsystem (by 9.5% on average over the best prior mechanism).

References

[1]
B. Abali et al. Memory Expansion Technology (MXT): Software Support and Performance. IBM J. Res. Dev., 2001.
[2]
A. R. Alameldeen and D. A. Wood. Adaptive Cache Compression for High-Performance Processors. In ISCA-31, 2004.
[3]
A. R. Alameldeen and D. A. Wood. Frequent Pattern Compression: A Significance-Based Compression Scheme for L2 Caches. Tech. Rep., 2004.
[4]
E. D. Berger. Memory Management for High-Performance Applications. PhD thesis, 2002.
[5]
X. Chen et al. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI Systems, 2010.
[6]
E. Cooper-Balis, P. Rosenfeld, and B. Jacob. Buffer-On-Board Memory Systems. In ISCA, 2012.
[7]
R. S. de Castro, A. P. do Lago, and D. Da Silva. Adaptive Compressed Caching: Design and Implementation. In SBAC-PAD, 2003.
[8]
F. Douglis. The Compression Cache: Using On-line Compression to Extend Physical Memory. In Winter USENIX Conference, 1993.
[9]
J. Dusser et al. Zero-Content Augmented Caches. In ICS, 2009.
[10]
M. Ekman and P. Stenström. A Robust Main-Memory Compression Scheme. In ISCA-32, 2005.
[11]
M. Farrens and A. Park. Dynamic Base Register Caching: A Technique for Reducing Address Bus Width. In ISCA, 1991.
[12]
E. G. Hallnor and S. K. Reinhardt. A Unified Compressed Memory Hierarchy. In HPCA-11, 2005.
[13]
D. Huffman. A Method for the Construction of Minimum-Redundancy Codes. IRE, 1952.
[14]
S. Iacobovici et al. Effective Stream-Based and Execution-Based Data Prefetching. In ICS, 2004.
[15]
Intel Corporation. Intel 64 and IA-32 Architectures Software Developer's Manual, 2013.
[16]
JEDEC. GDDR3 Specific SGRAM Functions, JESD21-C, 2012.
[17]
U. Kang et al. 8Gb 3D DDR3 DRAM Using Through-Silicon-Via Technology. In ISSCC, 2009.
[18]
S. F. Kaplan. Compressed Caching and Modern Virtual Memory Simulation. PhD thesis, 1999.
[19]
C. Lefurgy et al. Energy Management for Commercial Servers. In IEEE Computer, 2003.
[20]
C. Li, C. Ding, and K. Shen. Quantifying the Cost of Context Switch. In ExpCS, 2007.
[21]
S. Li et al. McPAT: An Integrated Power, Area, and Timing Modeling Framework for Multicore and Manycore Architectures. In MICRO-42, 2009.
[22]
P. S. Magnusson et al. Simics: A Full System Simulation Platform. IEEE Computer, 2002.
[23]
Micron. 2Gb: x4, x8, x16, DDR3 SDRAM, 2012.
[24]
H. Patil et al. Pinpointing Representative Portions of Large Intel Itanium Programs with Dynamic Instrumentation. In MICRO-37, 2004.
[25]
G. Pekhimenko et al. Base-Delta-Immediate Compression: Practical Data Compression for On-Chip Caches. In PACT, 2012.
[26]
G. Pekhimenko et al. Linearly Compressed Pages: A Main Memory Compression Framework with Low Complexity and Low Latency. In SAFARI Technical Report No. 2012--002, 2012.
[27]
V. Sathish, M. J. Schulte, and N. S. Kim. Lossless and Lossy Memory I/O Link Compression for Improving Performance of GPGPU Workloads. In PACT, 2012.
[28]
A. Snavely and D. M. Tullsen. Symbiotic Jobscheduling for a Simultaneous Multithreaded Processor. In ASPLOS-9, 2000.
[29]
SPEC CPU2006. http://www.spec.org/.
[30]
S. Srinath et al. Feedback Directed Prefetching: Improving the Performance and Bandwidth-Efficiency of Hardware Prefetchers. In HPCA-13, 2007.
[31]
S. Thoziyoor, N. Muralimanohar, J. H. Ahn, and N. P. Jouppi. CACTI 5.1. Technical Report HPL-2008-20, HP Laboratories, 2008.
[32]
M. Thuresson et al. Memory-Link Compression Schemes: A Value Locality Perspective. IEEE TC, 2008.
[33]
Transaction Processing Performance Council. http://www.tpc.org/.
[34]
R. B. Tremaine et al. Pinnacle: IBM MXT in a Memory Controller Chip. IEEE Micro, 2001.
[35]
P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. The Case for Compressed Caching in Virtual Memory Systems. In USENIX Annual Technical Conference, 1999.
[36]
J. Yang, R. Gupta, and C. Zhang. Frequent Value Encoding for Low Power Data Buses. ACM TODAES, 2004.
[37]
J. Yang, Y. Zhang, and R. Gupta. Frequent Value Compression in Data Caches. In MICRO-33, 2000.
[38]
D. H. Yoon, M. K. Jeong, M. Sullivan, and M. Erez. The Dynamic Granularity Memory System. In ISCA, 2012.
[39]
Y. Zhang, J. Yang, and R. Gupta. Frequent Value Locality and Value-Centric Data Cache Design. In ASPLOS-9, 2000.
[40]
J. Ziv and A. Lempel. A Universal Algorithm for Sequential Data Compression. IEEE TIT, 1977.

Cited By

View all
  • (2024)Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control PolicyМетодика компрессии данных в накристальных и межпроцессорных сетях с широкими каналами и политикой управления потоком wormholeInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.823:3(859-885)Online publication date: 28-May-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator IntegrationApplied Sciences10.3390/app13181052613:18(10526)Online publication date: 21-Sep-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MICRO-46: Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture
December 2013
498 pages
ISBN:9781450326384
DOI:10.1145/2540708
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 07 December 2013

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. DRAM
  2. data compression
  3. memory
  4. memory bandwidth
  5. memory capacity
  6. memory controller

Qualifiers

  • Research-article

Funding Sources

Conference

MICRO-46
Sponsor:

Acceptance Rates

MICRO-46 Paper Acceptance Rate 39 of 239 submissions, 16%;
Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)12
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hardware Compression Method for On-Chip and Interprocessor Networks with Wide Channels and Wormhole Flow Control PolicyМетодика компрессии данных в накристальных и межпроцессорных сетях с широкими каналами и политикой управления потоком wormholeInformatics and AutomationИнформатика и автоматизация10.15622/ia.23.3.823:3(859-885)Online publication date: 28-May-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator IntegrationApplied Sciences10.3390/app13181052613:18(10526)Online publication date: 21-Sep-2023
  • (2023)ZipKV: In-Memory Key-Value Store with Built-In Data CompressionProceedings of the 2023 ACM SIGPLAN International Symposium on Memory Management10.1145/3591195.3595273(150-162)Online publication date: 6-Jun-2023
  • (2023)ReFloat: Low-Cost Floating-Point Processing in ReRAM for Accelerating Iterative Linear SolversProceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis10.1145/3581784.3607077(1-15)Online publication date: 12-Nov-2023
  • (2023)Application of online data migration model and ID3 algorithm in sports competition data miningInternational Journal of System Assurance Engineering and Management10.1007/s13198-023-02171-0Online publication date: 27-Sep-2023
  • (2022)Exploiting Data Compression for Adaptive Block Placement in Hybrid CachesElectronics10.3390/electronics1102024011:2(240)Online publication date: 12-Jan-2022
  • (2022)L2C: Combining Lossy and Lossless Compression on Memory and I/OACM Transactions on Embedded Computing Systems10.1145/348164121:1(1-27)Online publication date: 14-Jan-2022
  • (2022)täkōProceedings of the 49th Annual International Symposium on Computer Architecture10.1145/3470496.3527379(42-58)Online publication date: 18-Jun-2022
  • (2022)Design and Simulation of Content-Aware Hybrid DRAM-PCM Memory SystemIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.312353933:7(1666-1677)Online publication date: 1-Jul-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media