research-article

Thesaurus: Efficient Cache Compression via Dynamic Clustering

Authors:

Amin Ghasemazar,

Mieszko LisAuthors Info & Claims

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

Pages 527 - 540

https://doi.org/10.1145/3373376.3378518

Published: 13 March 2020 Publication History

Abstract

In this paper, we identify a previously untapped source of compressibility in cache working sets: clusters of cachelines that are similar, but not identical, to one another. To compress the cache, we can then store the "clusteroid" of each cluster together with the (much smaller) "diffs" needed to reconstruct the rest of the cluster. To exploit this opportunity, we propose a hardware-level on-line cacheline clustering mechanism based on locality-sensitive hashing. Our method dynamically forms clusters as they appear in the data access stream and retires them as they disappear from the cache. Our evaluations show that we achieve 2.25× compression on average (and up to 9.9×) on SPEC~CPU~2017 suite and is significantly higher than prior proposals scaled to an iso-silicon budget.

References

[1]

D. Achlioptas. 2001. Database-friendly Random Projections. In PODS 2001.

[2]

A. Alameldeen and D. Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.

[3]

A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. In ISCA 2004.

[4]

A. Arelakis, F. Dahlgren, and P. Stenstr¨om. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In MICRO 2015. 38--49.

[5]

Angelos Arelakis and Per Stenstr¨om. 2014. SC2: A Statistical Compression Cache Scheme. In ISCA 2014.

[6]

S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim. 2013. ECM: Effective Capacity Maximizer for high-performance compressed caching. In HPCA 2013.

[7]

J. Bucek, K.-D. Lange, and J. von Kistowski. 2018. SPEC CPU2017 - Next-generation Compute Benchmark. In ICPE 2018.

[8]

J. Buhler. 2001. Efficient large-scale sequence comparison by localitysensitive hashing. Bioinformatics 17 (2001), 419--428.

[9]

M. S. Charikar. 2002. Similarity Estimation Techniques from Rounding Algorithms. In STOC 2002.

Digital Library

[10]

X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas. 2010. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI 18 (2010), 1196--1208.

Digital Library

[11]

D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and O. Azizi. 2012. HICAMP: Architectural Support for Efficient Concurrencysafe Shared Structured Data Access. In ASPLOS 2012.

[12]

O. Chum, J. Philbin, and A. Zisserman. 2008. Near Duplicate Image Detection: min-Hash and tf-idf Weighting. In BMVC 2008.

[13]

T. E. Denehy andW.W. Hsu. 2003. Duplicate management for reference data. In Research Report RJ10305, IBM.

[14]

J. Dusser, T. Piquet, and A. Seznec. 2009. Zero-content Augmented Caches. In ICS 2009.

[15]

M. Ekman and P. Stenstr¨om. 2005. A Robust Main-Memory Compression Scheme. In ISCA 2005.

[16]

A. M. Elkahky, Y. Song, and I. He. 2015. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems. In WWW 2015.

Digital Library

[17]

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996.

[18]

S. Fox, S. Tridgell, C. Jin, and P. H.W. Leong. 2016. Random projections for scaling machine learning on FPGAs. In FPT 2016.

[19]

P. Frankl and H. Maehara. 1988. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Series B 44 (1988), 355--362.

Digital Library

[20]

J. Gaur, A. R. Alameldeen, and S. Subramoney. 2016. Base-Victim Compression: An Opportunistic Cache Compression Architecture. In ISCA 2016.

[21]

A. Ghasemazar, M. Ewais, P. Nair, and M. Lis. 2020. 2DCC: Cache Compression in Two Dimensions. In DATE 2020.

[22]

M. Ghayoumi, M. Gomez, K. E. Baumstein, N. Persaud, and A. J. Perlowin. 2018. Local Sensitive Hashing (LSH) and Convolutional Neural Networks (CNNs) for Object Recognition. In ICMLA 2018.

[23]

A. Gionis, P. Indyk, and R. Motwani. 1999. Similarity Search in High Dimensions via Hashing. In VLDB 1999.

[24]

B. Hong, D. Plantenberg, D. D. E. Long, and M. Sivan-Zimet. 2004. Duplicate Data Elimination in a SAN File System. In MSST 2004.

[25]

P. Indyk and R. Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC.

[26]

M. M. Islam and P. Stenstr¨om. 2010. Characterization and Exploitation of Narrow-width Loads: The Narrow-width Cache Approach. In CASES 2010.

[27]

W. Johnson and J. Lindenstrauss. 1982. Extensions of Lipschitz mappings into a Hilbert space. Conference in Modern Analysis and Probability 26 (1982), 189--206.

[28]

J. Kim, M. Sullivan, E. Choukse, and M. Erez. 2016. Bit-plane Compression: Transforming Data for Better Compression in Many-core Architectures. In ISCA 2016.

[29]

S. Kottapalli and J. Baxter. 2009. Nehalem-EX CPU Architecture. In HotChips 2009.

[30]

P. Li, T. J. Hastie, and K. W. Church. 2006. Very Sparse Random Projections. In KDD 2006.

[31]

W. Liu, H. Wang, Y. Zhang, W. Wang, and L. Qin. 2019. I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space. In ICDE 2019. 1670--1673.

[32]

J. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Berkeley symposium on mathematical statistics and probability 1967.

[33]

N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In MICRO 2007.

[34]

T. M. Nguyen and D. Wentzlaff. 2015. MORC: A Manycore-oriented Compressed Cache. In MICRO 2015 (Waikiki, Hawaii). ACM, New York, NY, USA, 76--88.

[35]

J. Pan and D. Manocha. 2011. Fast GPU-based Locality Sensitive Hashing for K-nearest Neighbor Computation. In GIS 2011.

[36]

B. Panda and A. Seznec. 2016. Dictionary Sharing: An Efficient Cache Compression Scheme for Compressed Caches. In MICRO 2016.

[37]

G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In HPCA 2015.

[38]

G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In PACT 2012.

Digital Library

[39]

M. K. Qureshi, D. Thompson, and Y. N. Patt. 2005. The V-Way Cache: Demand Based Associativity via Global Replacement. In ISCA 2005.

[40]

J. San Miguel, J. Albericio, N. Enright Jerger, and A. Jaleel. 2016. The bunker cache for spatio-value approximation. In MICRO 2016.

[41]

J. San Miguel, J. Albericio, A. Moshovos, and N. Enright Jerger. 2015. Doppelg¨anger: a cache for approximate computing. In MICRO 2015.

[42]

D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In ISCA 2013.

[43]

S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed Compressed Caches. In MICRO 2014.

[44]

S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed Compressed Caches. In MICRO 2014.

[45]

S. Sardashti, A. Seznec, and D. A.Wood. 2016. Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache. ACM Trans. Archit. Code Optim. 13 (2016), 27:1--27:25.

Digital Library

[46]

S. Sardashti and D. A. Wood. 2013. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-optimized Compressed Caching. In MICRO 2013.

[47]

S. Sardashti and D. A. Wood. 2014. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization. IEEE Micro 34 (2014), 91--99.

[48]

T. Singh, S. Rangarajan, D. John, C. Henrion, S. Southard, H. McIntyre, A. Novak, S. Kosonocky, R. Jotwani, A. Schaefer, E. Chang, J. Bell, and M. Co. 2017. Zen: A next-generation high-performance x86 core. In ISSCC 2017.

[49]

G. Singh Manku, A. Jain, and A. Das Sarma. 2007. Detecting Nearduplicates for Web Crawling. In WWW 2007.

[50]

D. Skarlatos, N. S. Kim, and J. Torrellas. 2017. PageForge: A Near- Memory Content-Aware Page-Merging Architecture. In MICRO 2017. 302--314.

[51]

R. Spring and A. Shrivastava. 2017. Scalable and Sustainable Deep Learning via Randomized Hashing. In KDD 2017.

[52]

W. J. Starke. 2009. POWER7: IBM's Next Generation, Balanced POWER Server Chip. In HotChips 2009.

[53]

J. E. Stine, J. Chen, I. Castellanos, G. Sundararajan, M. Qayam, P. Kumar, J. Remington, and S. Sohoni. 2009. FreePDK v2.0: Transitioning VLSI education towards nanometer variation-aware designs. In MSE 2009.

[54]

J. Stuecheli and W. J. Starke. 2018. The IBM POWER9 Scale Up Processor. In HotChips 2018.

[55]

Y. Tian, S. M. Khan, D. A. Jim´enez, and G. H. Loh. 2014. Last-level Cache Deduplication. In ICS 2014 (Munich, Germany). 53--62.

[56]

P.-A. Tsai and D. Sanchez. 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy. In ASPLOS 2019.

[57]

P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. 1999. The Case for Compressed Caching in Virtual Memory Systems. In ATEC 1999.

[58]

J. Zheng and J. Luo. 2013. A PG-LSH Similarity Search Method for Cloud Storage. In CIS 2013.

Cited By

Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
John FSleeba S(2024)A Comprehensive Evaluation of Packet Compression Techniques for NoC2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)10.1109/SPICES62143.2024.10779893(1-5)Online publication date: 20-Sep-2024
https://doi.org/10.1109/SPICES62143.2024.10779893
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Show More Cited By

Index Terms

Thesaurus: Efficient Cache Compression via Dynamic Clustering
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Processors and memory architectures

Recommendations

Opportunistic compression for direct-mapped DRAM caches
MEMSYS '18: Proceedings of the International Symposium on Memory Systems

Large off-chip DRAM caches offer performance and bandwidth improvements for many systems by bridging the gap between on-chip last level caches and off-chip memories. To avoid the high hit latency resulting from serial DRAM accesses for tags and data, ...
Base-victim compression: an opportunistic cache compression architecture
ISCA '16: Proceedings of the 43rd International Symposium on Computer Architecture

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...
Base-victim compression: an opportunistic cache compression architecture
ISCA'16

The memory wall has motivated many enhancements to cache management policies aimed at reducing misses. Cache compression has been proposed to increase effective cache capacity, which potentially reduces capacity and conflict misses. However, complexity ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems

March 2020

1412 pages

ISBN:9781450371025

DOI:10.1145/3373376

General Chair:
James Larus
EPFL
,
Program Chairs:
Luis Ceze
University of Washington
,
Karin Strauss
Microsoft

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

SIGBED: ACM Special Interest Group on Embedded Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Natural Sciences and Engineering Research Council of Canada

Conference

ASPLOS '20

Sponsor:

ASPLOS '20: Architectural Support for Programming Languages and Operating Systems

March 16 - 20, 2020

Lausanne, Switzerland

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

14
Total Citations
View Citations
970
Total Downloads

Downloads (Last 12 months)86
Downloads (Last 6 weeks)4

Reflects downloads up to 24 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cheshmikhani EShokouhinia FFarbeh H(2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
https://doi.org/10.1109/TCSII.2024.3375640
John FSleeba S(2024)A Comprehensive Evaluation of Packet Compression Techniques for NoC2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)10.1109/SPICES62143.2024.10779893(1-5)Online publication date: 20-Sep-2024
https://doi.org/10.1109/SPICES62143.2024.10779893
Buyuktosunoglu ATrilla DAbali BBerger DWalters CLee J(2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
https://doi.org/10.1109/HPCA57654.2024.00080
Li YGao M(2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
https://doi.org/10.1109/HPCA56546.2023.10071115
Adnan MMaboud YMahajan DNair P(2022)Accelerating recommendation system training by leveraging popular choicesProceedings of the VLDB Endowment10.14778/3485450.348546215:1(127-140)Online publication date: 14-Jan-2022
https://dl.acm.org/doi/10.14778/3485450.3485462
Deb DM.K RJose J(2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
https://doi.org/10.1109/TPDS.2021.3090315
Angerd AArelakis ASpiliopoulos VSintorn EStenstrom P(2022)GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00085(1115-1127)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00085
Kim JKang MHong JKim S(2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
https://doi.org/10.1109/HPCA53966.2022.00084
Soundararajan NBraun PKhan TKasikci BLitz HSubramoney S(2021)PDede: Partitioned, Deduplicated, Delta Branch Target BufferMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480046(779-791)Online publication date: 18-Oct-2021
https://dl.acm.org/doi/10.1145/3466752.3480046
Tomei MDas SSeyedzadeh MBedoukian PBeckmann BKumar RWood D(2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
https://dl.acm.org/doi/10.1145/3462209
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten