research-article

BCD deduplication: effective memory compression using partial cache-line deduplication

Authors:

G. Edward SuhAuthors Info & Claims

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 52 - 64

https://doi.org/10.1145/3445814.3446722

Published: 17 April 2021 Publication History

Abstract

In this paper, we identify new partial data redundancy among multiple cache lines that are not exploited by traditional memory compression or memory deduplication. We propose Base and Compressed Difference (BCD) deduplication that effectively utilizes the partial matches among cache lines through a novel combination of compression and deduplication to increase the effective capacity of main memory. Experimental results show that BCD achieves the average compression ratio of 1.94× for SPEC2017, DaCapo, TPC-DS, and TPC-H, which is 48.4% higher than the best prior work. We also present an efficient implementation of BCD in a modern memory hierarchy, which compresses data in both the last-level cache (LLC) and main memory with modest area overhead. Even with additional meta-data accesses and compression/deduplication operations, cycle-level simulations show that BCD improves the performance of the SPEC2017 benchmarks by 2.7% on average because it increases the effective capacity of the LLC. Overall, the results show that BCD can significantly increase the capacity of main memory with little performance overhead.

References

[1]

SPEC CPU2017. https://www.spec.org/cpu2017. Accessed: 2020-08-15.

[2]

Bluent Abali, Hubertus Franke, Dan E. Pof, Robert A. Saccone, Charles O. Schulz, Lorraine M. Herger, and T. Basil Smith. Memory Expansion Technology (MXT): Software Support and Performance. IBM Journal of Research and Development, 2001.

Digital Library

[3]

Alaa Alameldeen and David Wood. Frequent Pattern Compression: A Significance-based Compression Scheme for L2 Caches. Technical report, University of Wisconsin-Madison, 2004.

[4]

Alaa R Alameldeen and David A Wood. Adaptive Cache Compression for Highperformance Processors. In Proceedings of the 31st Annual International Symposium on Computer Architecture, 2004.

[5]

Angelos Arelakis, Fredrik Dahlgren, and Per Stenstrom. Hycomp: A Hybrid Cache Compression Method for Selection of Data-type-specific Compression Methods. In Proceedings of the 48th International Symposium on Microarchitecture, 2015.

[6]

Angelos Arelakis and Per Stenstrom. A Case for a Value-aware Cache. IEEE Computer Architecture Letters, 2012.

[7]

Angelos Arelakis and Per Stenstrom. SC2: A Statistical Compression Cache Scheme. In Proceedings of the 41st Annual International Symposium on Computer Architecture, 2014.

Digital Library

[8]

Luiz AndrÃ? Barroso, Urs HÃ?lzle, and Parthasarathy Ranganathan. The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines. Morgan & Claypool Publishers, 3rd edition, 2018.

[9]

Stephen M Blackburn, Robin Garner, Chris Hofmann, Asjad M Khang, Kathryn S McKinley, Rotem Bentzur, Amer Diwan, Daniel Feinberg, Daniel Frampton, Samuel Z Guyer, Martin Hirzel, Antony Hosking, Maria Jump, Han Lee, J. Eliot B. Moss, Aashish Phansalkar, Darko Stefanovi?, Thomas VanDrunen, Daniel von Dincklage, and Ben Wiedermann. The DaCapo Benchmarks: Java Benchmarking Development and Analysis. In Proceedings of the 21st annual ACM SIGPLAN conference on Object-oriented Programming Systems, Languages, and Applications, 2006.

Digital Library

[10]

David Chen, Enoch Peserico, and Larry Rudolph. A Dynamically Partitionable Compressed Cache. In Proceedings of the Singapore-MIT Alliance Symposium, 2003.

[11]

Xi Chen, Lei Yang, Robert P. Dick, Li Shang, and Haris Lekatsas. C-Pack: A HighPerformance Microprocessor Cache Compression Algorithm. IEEE Transactions on Very Large Scale Integration Systems, 2010.

[12]

David Cheriton, Amin Firoozshahian, Alex Solomatnikov, John P Stevenson, and Omid Azizi. HICAMP: Architectural Support for Eficient Concurrency-safe Shared Structured Data Access. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, 2012.

Digital Library

[13]

Esha Choukse, Mattan Erez, and Alaa R. Alameldeen. Compresso: Pragmatic Main Memory Compression. In Proceedings of 51st International Symposium on Microarchitecture, 2018.

Digital Library

[14]

Cornel Constantinescu, Joseph Glider, and David Chambliss. Mixing Deduplication and Compression on Active Data Sets. In Data Compression Conference, 2011.

[15]

Giorgos Dimitrakopoulos, Kostas Galanopoulos, Christos Mavrokefalidis, and Dimitris Nikolos. Low-power Leading-zero Counting and Anticipation Logic for High-speed Floating Point Units. IEEE Transactions on Very Large Scale Integration Systems, 2008.

Digital Library

[16]

Julien Dusser, Thomas Piquet, and André Seznec. Zero-content Augmented Caches. In Proceedings of the 23rd International Conference on Supercomputing, 2009.

[17]

Magnus Ekman and Per Stenstrom. A Robust Main-memory Compression Scheme. In Proceedings of the 32nd Annual International Symposium on Computer Architecture, 2005.

[18]

Bart Goeman, Hans Vandierendonck, and Koenraad De Bosschere. Diferential FCM : Increasing Value Prediction Accuracy by Improving Table Usage Eficiency. In Proceedings of the 7th International Symposium on High-Performance Computer Architecture, 2001.

[19]

JEDEC. DDR4 SDRAM Standard. JESD79-4B, 2012.

[20]

Georgios Keramidas, Konstantinos Aisopos, and Stefanos Kaxiras. Dynamic Dictionary-based Data Compression for Level-1 Caches. In International Conference on Architecture of Computing Systems, 2006.

Digital Library

[21]

Jungrae Kim, Michael Sullivan, Esha Choukse, and Mattan Erez. Bit-plane Compression: Transforming Data for Better Compression in Many-core Architectures. In Inproceedings of the 43rd Annual International Symposium on Computer Architecture, 2016.

[22]

Jang-Soo Lee, Won-Kee Hong, and Shin-Dug Kim. Design and Evaluation of a Selective Compressed Memory System. In Proceedings of the IEEE International Conference on Computer Design, 1999.

Digital Library

[23]

Peter Lindstrom and Martin Isenburg. Fast and Eficient Compression of FloatingPoint Data. IEEE Transactions on Visualization and Computer Graphics, 2006.

[24]

Chi-Keung Luk, Robert Cohn, Robert Muth, Harish Patil, Artur Klauser, Geof Lowney, Steven Wallace, Vijay Janapa Reddi, and Kim Hazelwood. Pin: Building Customized Program Analysis Tools with Dynamic Instrumentation. In Programming Language Design and Implementation, 2015.

[25]

Sparsh Mittal and Jefrey S Vetter. A survey of Architectural Approaches for Data Compression in Cache and Main Memory Systems. IEEE Transactions on Parallel and Distributed Systems, 2015.

[26]

Tri M Nguyen and David Wentzlaf. MORC: A Manycore-oriented Compressed Cache. In Proceedings of the 48th International Symposium on Microarchitecture, 2015.

[27]

Gennady Pekhimenko. Practical Data Compression for Modern Memory Hierarchies. arXiv preprint arXiv:1609. 02067, 2016.

[28]

Gennady Pekhimenko, Vivek Seshadri, Yoongu Kim, Hongyi Xin, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. Linearly Compressed Pages: A Low-complexity, Low-latency Main Memory Compression Framework. In Proceedings of 46th International Symposium on Microarchitecture, 2013.

[29]

Gennady Pekhimenko, Vivek Seshadri, Onur Mutlu, Phillip B. Gibbons, Michael A. Kozuch, and Todd C. Mowry. Base-delta-immediate Compression: Practical Data Compression for On-chip Caches. In Proceedings of the 21st international conference on Parallel architectures and compilation techniques, 2012.

Digital Library

[30]

Meikel Poess and Chris Floyd. New TPC Benchmarks for Decision Support and Web Commerce. SIGMOD Record, 2000.

Digital Library

[31]

Meikel Poess, Bryan Smith, Lubor Kollar, and Paul Larson. TPC-DS, Taking Decision Support Benchmarking to the Next Level. In ACM SIGMOD International Conference on Management of Data, 2002.

[32]

Paruj Ratanaworabhan, Jian Ke, and Martin Burtscher. Fast Lossless Compression of Scientific Floating-point Data. In Data Compression Conference, 2006.

[33]

David Salomon. Data Compression: The Complete Reference. Springer Science & Business Media, 2004.

Digital Library

[34]

Daniel Sanchez and Christos Kozyrakis. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-Core Systems. In Proceedings of the 40th Annual International Symposium on Computer Architecture, 2013.

[35]

Sarabjeet Singh and Manu Awasthi. Memory Centric Characterization and Analysis of SPEC CPU2017 Suite. In Proceedings of the 2019 ACM/SPEC International Conference on Performance Engineering, 2019.

[36]

John Peter Stevenson. Fine-grain In-memory Deduplication for Large-scale Workloads. Stanford University, 2013.

[37]

Yingying Tian, Samira M. Khan, Daniel A. Jiménez, and Gabriel H. Loh. Last-level Cache Deduplication. In Proceedings of the 28th ACM international conference on Supercomputing, 2014.

Digital Library

[38]

Luis Villa, Michael Zhang, and Krste Asanovi?. Dynamic Zero Compression for Cache Energy Reduction. In Proceedings of the 33rd International Symposium on Microarchitecture, 2000.

[39]

Yuejian Xie and Gabriel H Loh. Thread-aware Dynamic Shared Cache Compression in Multi-core Processors. In 2011 IEEE 29th International Conference on Computer Design, 2011.

[40]

Jun Yang, Youtao Zhang, and Rajiv Gupta. Frequent Value Compression in Data Caches. In Proceedings of the 33rd International Symposium on Microarchitecture, 2000.

[41]

Vinson Young, Prashant J Nair, and Moinuddin K Qureshi. Dice : Compressing dram caches for bandwidth and capacity. In Proceedings of the 44th Annual International Symposium on Computer Architecture, 2017.

Digital Library

[42]

Youtao Zhang, Jun Yang, and Rajiv Gupta. Frequent Value Locality and Valuecentric Data Cache Design. In Proceedings of the 9th International Conference on Architectural Support for Programming Languages and Operating Systems, 2000.

Cited By

Wei GLi CXu RZhuge QSha E(2024)Sparrow: Flexible Memory Deduplication in Android Systems with Similar-Page Awareness2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546588(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546588
Du CLin ZWu SChen YWu JWang SWang WWu QMao B(2024)FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main MemoryACM Transactions on Storage10.1145/366273620:4(1-33)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1145/3662736
Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
Show More Cited By

Index Terms

BCD deduplication: effective memory compression using partial cache-line deduplication

Recommendations

Evaluating Memory Compression and Deduplication
NAS '13: Proceedings of the 2013 IEEE Eighth International Conference on Networking, Architecture and Storage

Many programs require more RAM to hold their data than a typical computer has. Theoretically, both the compression and deduplication can trade the rich computing capacity for more available RAM space. This paper comprehensively evaluates the performance ...
CA-Dedupe: content-aware deduplication in SSDs
Abstract
Flash memories have been around for many years because of their high performance compared to HDDs. But flash memories have a limited lifespan, and they will wear prematurely if used in write-intensive usages. Solutions such as wear leveling, ...
Storage Deduplication by Virtual Large-Scale Disks
NBIS '12: Proceedings of the 2012 15th International Conference on Network-Based Information Systems

Recently, the demand of low cost large scale storages increases. We developed VLSD (Virtual Large Scale Disks) toolkit for constructing virtual disk based distributed storages, which aggregate free spaces of individual disks. VLSD realizes low-cost ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 2021

1090 pages

ISBN:9781450383172

DOI:10.1145/3445814

General Chair:
Tim Sherwood
University of California at Santa Barbara, USA
,
Program Chairs:
Emery Berger
University of Massachusetts at Amherst, USA
,
Christos Kozyrakis
Stanford University, USA

Copyright © 2021 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGPLAN: ACM Special Interest Group on Programming Languages

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2021

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

ASPLOS '21

Sponsor:

SIGPLAN

ASPLOS '21: 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

April 19 - 23, 2021

Virtual, USA

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
1,092
Total Downloads

Downloads (Last 12 months)176
Downloads (Last 6 weeks)25

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wei GLi CXu RZhuge QSha E(2024)Sparrow: Flexible Memory Deduplication in Android Systems with Similar-Page Awareness2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546588(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546588
Du CLin ZWu SChen YWu JWang SWang WWu QMao B(2024)FSDedup: Feature-Aware and Selective Deduplication for Improving Performance of Encrypted Non-Volatile Main MemoryACM Transactions on Storage10.1145/366273620:4(1-33)Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1145/3662736
Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
Panwar GLaghari MChoukse EJian X(2024)DyLeCT: Achieving Huge-page-like Translation Performance for Hardware-compressed Memory2024 ACM/IEEE 51st Annual International Symposium on Computer Architecture (ISCA)10.1109/ISCA59077.2024.00085(1129-1143)Online publication date: 29-Jun-2024
https://doi.org/10.1109/ISCA59077.2024.00085
Loloeyan PNikmehr HRezaei M(2024)A novel approximate cache block compressor for error-resilient image dataComputers and Electrical Engineering10.1016/j.compeleceng.2024.109106115:COnline publication date: 2-Jul-2024
https://dl.acm.org/doi/10.1016/j.compeleceng.2024.109106
Gao CXu XYang ZLin LLi J(2023)QZRAM: A Transparent Kernel Memory Compression System Design for Memory-Intensive Applications with QAT Accelerator IntegrationApplied Sciences10.3390/app13181052613:18(10526)Online publication date: 21-Sep-2023
https://doi.org/10.3390/app131810526
Du CWu SWu JMao BWang S(2023)ESD: An ECC-assisted and Selective Deduplication for Encrypted Non-Volatile Main Memory2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071011(977-990)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071011
He QZhang FBian GZhang WDuan DLi ZChen C(2022)Research on Data Routing Strategy of Deduplication in Cloud EnvironmentIEEE Access10.1109/ACCESS.2021.313975710(9529-9542)Online publication date: 2022
https://doi.org/10.1109/ACCESS.2021.3139757
Mo YHua YLi PCao QLiu X(2021)A Cost-Efficient Metadata Scheme for High-Performance Deduplication Systems2021 IEEE 23rd Int Conf on High Performance Computing & Communications; 7th Int Conf on Data Science & Systems; 19th Int Conf on Smart City; 7th Int Conf on Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys)10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00034(49-56)Online publication date: Dec-2021
https://doi.org/10.1109/HPCC-DSS-SmartCity-DependSys53884.2021.00034

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents