research-article

Open access

Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory

Authors:

Bruce Childers,

Daniel MosséAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 9, Issue 4

Article No.: 55, Pages 1 - 20

https://doi.org/10.1145/2400682.2400714

Published: 20 January 2013 Publication History

Abstract

Limited PCM write bandwidth is a critical obstacle to achieve good performance from hybrid DRAM/PCM memory systems. The write bandwidth is severely restricted in PCM devices, which harms application performance. Indeed, as we show, it is more important to reduce PCM write traffic than to reduce PCM read latency for application performance. To reduce the number of PCM writes, we propose a DRAM cache organization that employs compression. A new delta compression technique for modified data is used to achieve a large compression ratio. Our approach can selectively and predictively apply compression to improve its efficiency and performance. Our approach is designed to facilitate adoption in existing main memory compression frameworks. We describe an instance of how to incorporate delta compression in IBM's MXT memory compression architecture when used for DRAM cache in a hybrid main memory. For fourteen representative memory-intensive workloads, on average, our delta compression technique reduces the number of PCM writes by 54.3%, and improves IPC performance by 24.4%.

References

[1]

Alameldeen, A. and Wood, D. 2004a. Adaptive cache compression for high-performance processors. In Proceedings of the 31st International Symposium on Computer Architecture. 212--223.

Digital Library

[2]

Alameldeen, A. R. and Wood, D. A. 2004b. Frequent pattern compression: A significance-based compression scheme for l2 caches. Tech. rep., University of Wisconsin-Madison.

[3]

Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.

Digital Library

[4]

Chen, S., Gibbons, P. B., and Nath, S. 2011. Rethinking database algorithms for phase change memory. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. 21--31.

[5]

Choi, Y., Song, I., Park, M.-H., Chung, H., Chang, S., Cho, B., Kim, J., Oh, Y., Kwon, D., Sunwoo, J., Shin, J., Rho, Y., Lee, C., Kang, M. G., Lee, J., Kwon, Y., Kim, S., Kim, J., Lee, Y.-J., Wang, Q., Cha, S., Ahn, S., Horii, H., Lee, J., Kim, K., Joo, H., Lee, K., Lee, Y.-T., Yoo, J., and Jeong, G. 2012. A 20nm 1.8v 8gb pram with 40mb/s program bandwidth. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 46--48.

[6]

CPU2006. Spec cpu2006: http://www.spec.org/cpu2006/docs/readme1st.html.

[7]

Das, R., Mishra, A., Nicopoulos, C., Park, D., Narayanan, V., Iyer, R., Yousif, M., and Das, C. 2008. Performance and power optimization through data compression in network-on-chip architectures. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA'08). 215--225.

[8]

David, H., Fallin, C., Gorbatov, E., Hanebutte, U. R., and Mutlu, O. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th International Conference on Autonomic Computing (ICAC'11). 31--40.

Digital Library

[9]

Douglis, F. 1993. The compression cache: Using on-line compression to extend physical memory. In Proceedings of the 1993 Winter USENIX Conference. 519--529.

[10]

Ekman, M. and Stenstrom, P. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA'05). 74--85.

Digital Library

[11]

Ferreira, A. P., Zhou, M., Bock, S., Childers, B., Melhem, R., and Mossé, D. 2010. Increasing pcm main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'10). 914--919.

Digital Library

[12]

Franaszek, P., Robinson, J., and Thomas, J. 1996. Parallel compression with cooperative dictionary construction. In Proceedings of the Data Compression Conference. 200--209.

Digital Library

[13]

Hoelzle, U. and Barroso, L. 2009. The datacenter as a computer. www.intel.com.

[14]

Intel. 2011. Transform mission-critical computing. www.intel.com.

[15]

Kim, I., Cho, S., Im, D., Cho, E., Kim, D., Oh, G., Ahn, D., Park, S., Nam, S., Moon, J., and Chung, C. 2010. High performance pram cell scalable to sub-20nm technology with below 4f2 cell size, extendable to dram applications. In Proceedings of the Symposium on VLSI Technology (VLSIT'10). 203--204.

[16]

Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 2--13.

Digital Library

[17]

Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. 2003. Energy management for commercial servers. Comput. 36, 12, 39--48.

Digital Library

[18]

Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.

Digital Library

[19]

Malladi, K., Nothaft, F., Periyathambi, K., Lee, B., Kozyrakis, C., and Horowitz, M. 2012. Towards energy-proportional datacenter memory with mobile dram. In Proceedings of the 39th International Symposium on Computer Architecture. 37--48.

Digital Library

[20]

Mogul, J. C., Argollo, E., Shah, M., and Faraboschi, P. 2009. Operating system support for nvm+dram hybrid main memory. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS'09). 14.

Digital Library

[21]

Qureshi, M., Franceschini, M., and Lastras-Montano, L. 2010a. Improving read performance of phase change memories via write cancellation and write pausing. In Proceedings of the 16th International Symposium on High-Performance Computer Architecture (HPCA'10). 1--11.

[22]

Qureshi, M. K., Franceschini, M. M., Jagmohan, A., and Lastras, L. A. 2012. Preset: Improving performance of phase change memories by exploiting asymmetry in write times. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA'12). 380--391.

Digital Library

[23]

Qureshi, M. K., Franceschini, M. M., Lastras-Montaño, L. A., and Karidis, J. P. 2010b. Morphable memory system: a robust architecture for exploiting multi-level phase change memories. In Proceedings of the 37th International Symposium on Computer Architecture (ISCA'10). 153--162.

Digital Library

[24]

Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th International Symposium on Computer Architecture (ISCA'07). 381--391.

Digital Library

[25]

Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 24--33.

Digital Library

[26]

Ramos, L. E., Gorbatov, E., and Bianchini, R. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS'11). 85--95.

Digital Library

[27]

Saab, P. 2008. Scaling memcached at facebook, engineering note. Facebook.

[28]

Sinharoy, B., Kalla, R., Starke, W. J., Le, H. Q., Cargnoni, R., Van Norstrand, J. A., Ronchetti, B. J., Stuecheli, J., Leenstra, J., Guthrie, G. L., Nguyen, D. Q., Blaner, B., Marino, C. F., Retter, E., and Williams, P. 2011. Ibm power7 multicore server processor. IBM J. Res. Devel. 55, 3, 1:1--1:29.

Digital Library

[29]

Suel, T. and Memon, N. 2002. Algorithms for delta compression and remote file synchronization. In Lossless Compression Handbook.

[30]

Tremaine, R. B., Franaszek, P. A., Robinson, J. T., Schulz, C. O., Smith, T. B., Wazlowski, M. E., and Bland, P. M. 2001. Ibm memory expansion technology (mxt). IBM J. Res. Dev. 45, 271--285.

Digital Library

[31]

Zhang, W. and Li, T. 2009. Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. 101--112.

Digital Library

Cited By

Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Tang DMao MYao YBao CShi QXie CXu RHaghighat MWang YQi ZGuan HCao X(2023)rShareJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.103009145:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.sysarc.2023.103009
Tang DLi LMa JLiu XQi ZGuan H(2021)gRemoteJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102055116:COnline publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102055
Show More Cited By

Index Terms

Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory
1. Networks
  1. Network protocols

Recommendations

Write Activity Minimization for Nonvolatile Main Memory Via Scheduling and Recomputation

Nonvolatile memories such as Flash memory, phase change memory (PCM), and magnetic random access memory (MRAM) have many desirable characteristics for embedded systems to employ them as main memory. However, there are two common challenges we need to ...
Write-aware memory management for hybrid SLC-MLC PCM memory systems

In recent years, phase-change memory (PCM) has generated a great deal of interest because of its byte addressability and non-volatility properties. It is regarded as a good alternative storage medium that can reduce the performance gap between the main ...
A durable and energy efficient main memory using phase change memory technology
ISCA '09: Proceedings of the 36th annual international symposium on Computer architecture

Using nonvolatile memories in memory hierarchy has been investigated to reduce its energy consumption because nonvolatile memories consume zero leakage power in memory cells. One of the difficulties is, however, that the endurance of most nonvolatile ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 9, Issue 4

Special Issue on High-Performance Embedded Architectures and Compilers

January 2013

876 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/2400682

Issue’s Table of Contents

Copyright © 2013 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 January 2013

Accepted: 01 November 2012

Revised: 01 November 2012

Received: 01 June 2012

Published in TACO Volume 9, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
969
Total Downloads

Downloads (Last 12 months)102
Downloads (Last 6 weeks)12

Reflects downloads up to 09 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Tang DMao MYao YBao CShi QXie CXu RHaghighat MWang YQi ZGuan HCao X(2023)rShareJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.103009145:COnline publication date: 1-Dec-2023
https://dl.acm.org/doi/10.1016/j.sysarc.2023.103009
Tang DLi LMa JLiu XQi ZGuan H(2021)gRemoteJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102055116:COnline publication date: 1-Jun-2021
https://dl.acm.org/doi/10.1016/j.sysarc.2021.102055
Prabhu MUpadhyay HSai R(2019)Hyper switching memory utilization on hybrid main memory for improved task execution and reduced power consumptionMicroprocessors and Microsystems10.1016/j.micpro.2019.102891(102891)Online publication date: Sep-2019
https://doi.org/10.1016/j.micpro.2019.102891
Qian CHuang LYu QWang ZChilders BKaeli DPericàs M(2018)CMHProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203235(121-128)Online publication date: 8-May-2018
https://dl.acm.org/doi/10.1145/3203217.3203235
Young VNair PQureshi M(2017)DICEACM SIGARCH Computer Architecture News10.1145/3140659.308024345:2(627-638)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3140659.3080243
Young VNair PQureshi M(2017)DICEProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080243(627-638)Online publication date: 24-Jun-2017
https://dl.acm.org/doi/10.1145/3079856.3080243
Isenberg TPlatzner MWehrheim HWiersema T(2017)Proof-Carrying Hardware via Inductive InvariantsACM Transactions on Design Automation of Electronic Systems10.1145/305474322:4(1-23)Online publication date: 20-Jul-2017
https://dl.acm.org/doi/10.1145/3054743
Pagliari DMacii EPoncino M(2017)Approximate Energy-Efficient Encoding for Serial InterfacesACM Transactions on Design Automation of Electronic Systems10.1145/304122022:4(1-25)Online publication date: 20-May-2017
https://dl.acm.org/doi/10.1145/3041220
Agrawal PBroxterman MChatterjee BCuevas PHayashi KKahng AMyana PNath S(2017)Optimal Scheduling and Allocation for IC Design Management and Cost ReductionACM Transactions on Design Automation of Electronic Systems10.1145/303548322:4(1-30)Online publication date: 9-Jun-2017
https://dl.acm.org/doi/10.1145/3035483
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Figures

Tables

Media

View Issue’s Table of Contents