Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory

Published: 20 January 2013 Publication History

Abstract

Limited PCM write bandwidth is a critical obstacle to achieve good performance from hybrid DRAM/PCM memory systems. The write bandwidth is severely restricted in PCM devices, which harms application performance. Indeed, as we show, it is more important to reduce PCM write traffic than to reduce PCM read latency for application performance. To reduce the number of PCM writes, we propose a DRAM cache organization that employs compression. A new delta compression technique for modified data is used to achieve a large compression ratio. Our approach can selectively and predictively apply compression to improve its efficiency and performance. Our approach is designed to facilitate adoption in existing main memory compression frameworks. We describe an instance of how to incorporate delta compression in IBM's MXT memory compression architecture when used for DRAM cache in a hybrid main memory. For fourteen representative memory-intensive workloads, on average, our delta compression technique reduces the number of PCM writes by 54.3%, and improves IPC performance by 24.4%.

References

[1]
Alameldeen, A. and Wood, D. 2004a. Adaptive cache compression for high-performance processors. In Proceedings of the 31st International Symposium on Computer Architecture. 212--223.
[2]
Alameldeen, A. R. and Wood, D. A. 2004b. Frequent pattern compression: A significance-based compression scheme for l2 caches. Tech. rep., University of Wisconsin-Madison.
[3]
Bienia, C., Kumar, S., Singh, J. P., and Li, K. 2008. The parsec benchmark suite: Characterization and architectural implications. In Proceedings of the 17th International Conference on Parallel Architectures and Compilation Techniques.
[4]
Chen, S., Gibbons, P. B., and Nath, S. 2011. Rethinking database algorithms for phase change memory. In Proceedings of the 5th Biennial Conference on Innovative Data Systems Research. 21--31.
[5]
Choi, Y., Song, I., Park, M.-H., Chung, H., Chang, S., Cho, B., Kim, J., Oh, Y., Kwon, D., Sunwoo, J., Shin, J., Rho, Y., Lee, C., Kang, M. G., Lee, J., Kwon, Y., Kim, S., Kim, J., Lee, Y.-J., Wang, Q., Cha, S., Ahn, S., Horii, H., Lee, J., Kim, K., Joo, H., Lee, K., Lee, Y.-T., Yoo, J., and Jeong, G. 2012. A 20nm 1.8v 8gb pram with 40mb/s program bandwidth. In IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 46--48.
[6]
CPU2006. Spec cpu2006: http://www.spec.org/cpu2006/docs/readme1st.html.
[7]
Das, R., Mishra, A., Nicopoulos, C., Park, D., Narayanan, V., Iyer, R., Yousif, M., and Das, C. 2008. Performance and power optimization through data compression in network-on-chip architectures. In Proceedings of the 14th International Symposium on High-Performance Computer Architecture (HPCA'08). 215--225.
[8]
David, H., Fallin, C., Gorbatov, E., Hanebutte, U. R., and Mutlu, O. 2011. Memory power management via dynamic voltage/frequency scaling. In Proceedings of the 8th International Conference on Autonomic Computing (ICAC'11). 31--40.
[9]
Douglis, F. 1993. The compression cache: Using on-line compression to extend physical memory. In Proceedings of the 1993 Winter USENIX Conference. 519--529.
[10]
Ekman, M. and Stenstrom, P. 2005. A robust main-memory compression scheme. In Proceedings of the 32nd International Symposium on Computer Architecture (ISCA'05). 74--85.
[11]
Ferreira, A. P., Zhou, M., Bock, S., Childers, B., Melhem, R., and Mossé, D. 2010. Increasing pcm main memory lifetime. In Proceedings of the Conference on Design, Automation and Test in Europe (DATE'10). 914--919.
[12]
Franaszek, P., Robinson, J., and Thomas, J. 1996. Parallel compression with cooperative dictionary construction. In Proceedings of the Data Compression Conference. 200--209.
[13]
Hoelzle, U. and Barroso, L. 2009. The datacenter as a computer. www.intel.com.
[14]
Intel. 2011. Transform mission-critical computing. www.intel.com.
[15]
Kim, I., Cho, S., Im, D., Cho, E., Kim, D., Oh, G., Ahn, D., Park, S., Nam, S., Moon, J., and Chung, C. 2010. High performance pram cell scalable to sub-20nm technology with below 4f2 cell size, extendable to dram applications. In Proceedings of the Symposium on VLSI Technology (VLSIT'10). 203--204.
[16]
Lee, B. C., Ipek, E., Mutlu, O., and Burger, D. 2009. Architecting phase change memory as a scalable dram alternative. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 2--13.
[17]
Lefurgy, C., Rajamani, K., Rawson, F., Felter, W., Kistler, M., and Keller, T. 2003. Energy management for commercial servers. Comput. 36, 12, 39--48.
[18]
Magnusson, P., Christensson, M., Eskilson, J., Forsgren, D., Hallberg, G., Hogberg, J., Larsson, F., Moestedt, A., and Werner, B. 2002. Simics: A full system simulation platform. Comput. 35, 2, 50--58.
[19]
Malladi, K., Nothaft, F., Periyathambi, K., Lee, B., Kozyrakis, C., and Horowitz, M. 2012. Towards energy-proportional datacenter memory with mobile dram. In Proceedings of the 39th International Symposium on Computer Architecture. 37--48.
[20]
Mogul, J. C., Argollo, E., Shah, M., and Faraboschi, P. 2009. Operating system support for nvm+dram hybrid main memory. In Proceedings of the 12th Conference on Hot Topics in Operating Systems (HotOS'09). 14.
[21]
Qureshi, M., Franceschini, M., and Lastras-Montano, L. 2010a. Improving read performance of phase change memories via write cancellation and write pausing. In Proceedings of the 16th International Symposium on High-Performance Computer Architecture (HPCA'10). 1--11.
[22]
Qureshi, M. K., Franceschini, M. M., Jagmohan, A., and Lastras, L. A. 2012. Preset: Improving performance of phase change memories by exploiting asymmetry in write times. In Proceedings of the 39th International Symposium on Computer Architecture (ISCA'12). 380--391.
[23]
Qureshi, M. K., Franceschini, M. M., Lastras-Montaño, L. A., and Karidis, J. P. 2010b. Morphable memory system: a robust architecture for exploiting multi-level phase change memories. In Proceedings of the 37th International Symposium on Computer Architecture (ISCA'10). 153--162.
[24]
Qureshi, M. K., Jaleel, A., Patt, Y. N., Steely, S. C., and Emer, J. 2007. Adaptive insertion policies for high performance caching. In Proceedings of the 34th International Symposium on Computer Architecture (ISCA'07). 381--391.
[25]
Qureshi, M. K., Srinivasan, V., and Rivers, J. A. 2009. Scalable high performance main memory system using phase-change memory technology. In Proceedings of the 36th International Symposium on Computer Architecture (ISCA'09). 24--33.
[26]
Ramos, L. E., Gorbatov, E., and Bianchini, R. 2011. Page placement in hybrid memory systems. In Proceedings of the International Conference on Supercomputing (ICS'11). 85--95.
[27]
Saab, P. 2008. Scaling memcached at facebook, engineering note. Facebook.
[28]
Sinharoy, B., Kalla, R., Starke, W. J., Le, H. Q., Cargnoni, R., Van Norstrand, J. A., Ronchetti, B. J., Stuecheli, J., Leenstra, J., Guthrie, G. L., Nguyen, D. Q., Blaner, B., Marino, C. F., Retter, E., and Williams, P. 2011. Ibm power7 multicore server processor. IBM J. Res. Devel. 55, 3, 1:1--1:29.
[29]
Suel, T. and Memon, N. 2002. Algorithms for delta compression and remote file synchronization. In Lossless Compression Handbook.
[30]
Tremaine, R. B., Franaszek, P. A., Robinson, J. T., Schulz, C. O., Smith, T. B., Wazlowski, M. E., and Bland, P. M. 2001. Ibm memory expansion technology (mxt). IBM J. Res. Dev. 45, 271--285.
[31]
Zhang, W. and Li, T. 2009. Exploring phase change memory and 3d die-stacking for power/thermal friendly, fast and durable memory architectures. In Proceedings of the 18th International Conference on Parallel Architectures and Compilation Techniques. 101--112.

Cited By

View all

Index Terms

  1. Delta-compressed caching for overcoming the write bandwidth limitation of hybrid main memory

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 9, Issue 4
    Special Issue on High-Performance Embedded Architectures and Compilers
    January 2013
    876 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/2400682
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 20 January 2013
    Accepted: 01 November 2012
    Revised: 01 November 2012
    Received: 01 June 2012
    Published in TACO Volume 9, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Phase change memory
    2. memory compression

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)102
    • Downloads (Last 6 weeks)12
    Reflects downloads up to 09 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
    • (2023)rShareJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2023.103009145:COnline publication date: 1-Dec-2023
    • (2021)gRemoteJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2021.102055116:COnline publication date: 1-Jun-2021
    • (2019)Hyper switching memory utilization on hybrid main memory for improved task execution and reduced power consumptionMicroprocessors and Microsystems10.1016/j.micpro.2019.102891(102891)Online publication date: Sep-2019
    • (2018)CMHProceedings of the 15th ACM International Conference on Computing Frontiers10.1145/3203217.3203235(121-128)Online publication date: 8-May-2018
    • (2017)DICEACM SIGARCH Computer Architecture News10.1145/3140659.308024345:2(627-638)Online publication date: 24-Jun-2017
    • (2017)DICEProceedings of the 44th Annual International Symposium on Computer Architecture10.1145/3079856.3080243(627-638)Online publication date: 24-Jun-2017
    • (2017)Proof-Carrying Hardware via Inductive InvariantsACM Transactions on Design Automation of Electronic Systems10.1145/305474322:4(1-23)Online publication date: 20-Jul-2017
    • (2017)Approximate Energy-Efficient Encoding for Serial InterfacesACM Transactions on Design Automation of Electronic Systems10.1145/304122022:4(1-25)Online publication date: 20-May-2017
    • (2017)Optimal Scheduling and Allocation for IC Design Management and Cost ReductionACM Transactions on Design Automation of Electronic Systems10.1145/303548322:4(1-30)Online publication date: 9-Jun-2017
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Login options

    Full Access

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media