Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3373376.3378518acmconferencesArticle/Chapter ViewAbstractPublication PagesasplosConference Proceedingsconference-collections
research-article

Thesaurus: Efficient Cache Compression via Dynamic Clustering

Published: 13 March 2020 Publication History

Abstract

In this paper, we identify a previously untapped source of compressibility in cache working sets: clusters of cachelines that are similar, but not identical, to one another. To compress the cache, we can then store the "clusteroid" of each cluster together with the (much smaller) "diffs" needed to reconstruct the rest of the cluster. To exploit this opportunity, we propose a hardware-level on-line cacheline clustering mechanism based on locality-sensitive hashing. Our method dynamically forms clusters as they appear in the data access stream and retires them as they disappear from the cache. Our evaluations show that we achieve 2.25× compression on average (and up to 9.9×) on SPEC~CPU~2017 suite and is significantly higher than prior proposals scaled to an iso-silicon budget.

References

[1]
D. Achlioptas. 2001. Database-friendly Random Projections. In PODS 2001.
[2]
A. Alameldeen and D. Wood. 2004. Frequent pattern compression: A significance-based compression scheme for L2 caches. Technical Report. University of Wisconsin-Madison Department of Computer Sciences.
[3]
A. R. Alameldeen and D. A. Wood. 2004. Adaptive cache compression for high-performance processors. In ISCA 2004.
[4]
A. Arelakis, F. Dahlgren, and P. Stenstr¨om. 2015. HyComp: A hybrid cache compression method for selection of data-type-specific compression methods. In MICRO 2015. 38--49.
[5]
Angelos Arelakis and Per Stenstr¨om. 2014. SC2: A Statistical Compression Cache Scheme. In ISCA 2014.
[6]
S. Baek, H. G. Lee, C. Nicopoulos, J. Lee, and J. Kim. 2013. ECM: Effective Capacity Maximizer for high-performance compressed caching. In HPCA 2013.
[7]
J. Bucek, K.-D. Lange, and J. von Kistowski. 2018. SPEC CPU2017 - Next-generation Compute Benchmark. In ICPE 2018.
[8]
J. Buhler. 2001. Efficient large-scale sequence comparison by localitysensitive hashing. Bioinformatics 17 (2001), 419--428.
[9]
M. S. Charikar. 2002. Similarity Estimation Techniques from Rounding Algorithms. In STOC 2002.
[10]
X. Chen, L. Yang, R. P. Dick, L. Shang, and H. Lekatsas. 2010. C-Pack: A High-Performance Microprocessor Cache Compression Algorithm. IEEE Transactions on VLSI 18 (2010), 1196--1208.
[11]
D. Cheriton, A. Firoozshahian, A. Solomatnikov, J. P. Stevenson, and O. Azizi. 2012. HICAMP: Architectural Support for Efficient Concurrencysafe Shared Structured Data Access. In ASPLOS 2012.
[12]
O. Chum, J. Philbin, and A. Zisserman. 2008. Near Duplicate Image Detection: min-Hash and tf-idf Weighting. In BMVC 2008.
[13]
T. E. Denehy andW.W. Hsu. 2003. Duplicate management for reference data. In Research Report RJ10305, IBM.
[14]
J. Dusser, T. Piquet, and A. Seznec. 2009. Zero-content Augmented Caches. In ICS 2009.
[15]
M. Ekman and P. Stenstr¨om. 2005. A Robust Main-Memory Compression Scheme. In ISCA 2005.
[16]
A. M. Elkahky, Y. Song, and I. He. 2015. A Multi-View Deep Learning Approach for Cross Domain User Modeling in Recommendation Systems. In WWW 2015.
[17]
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu. 1996. A density-based algorithm for discovering clusters in large spatial databases with noise. In KDD 1996.
[18]
S. Fox, S. Tridgell, C. Jin, and P. H.W. Leong. 2016. Random projections for scaling machine learning on FPGAs. In FPT 2016.
[19]
P. Frankl and H. Maehara. 1988. The Johnson-Lindenstrauss lemma and the sphericity of some graphs. Journal of Combinatorial Theory, Series B 44 (1988), 355--362.
[20]
J. Gaur, A. R. Alameldeen, and S. Subramoney. 2016. Base-Victim Compression: An Opportunistic Cache Compression Architecture. In ISCA 2016.
[21]
A. Ghasemazar, M. Ewais, P. Nair, and M. Lis. 2020. 2DCC: Cache Compression in Two Dimensions. In DATE 2020.
[22]
M. Ghayoumi, M. Gomez, K. E. Baumstein, N. Persaud, and A. J. Perlowin. 2018. Local Sensitive Hashing (LSH) and Convolutional Neural Networks (CNNs) for Object Recognition. In ICMLA 2018.
[23]
A. Gionis, P. Indyk, and R. Motwani. 1999. Similarity Search in High Dimensions via Hashing. In VLDB 1999.
[24]
B. Hong, D. Plantenberg, D. D. E. Long, and M. Sivan-Zimet. 2004. Duplicate Data Elimination in a SAN File System. In MSST 2004.
[25]
P. Indyk and R. Motwani. 1998. Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality. In STOC.
[26]
M. M. Islam and P. Stenstr¨om. 2010. Characterization and Exploitation of Narrow-width Loads: The Narrow-width Cache Approach. In CASES 2010.
[27]
W. Johnson and J. Lindenstrauss. 1982. Extensions of Lipschitz mappings into a Hilbert space. Conference in Modern Analysis and Probability 26 (1982), 189--206.
[28]
J. Kim, M. Sullivan, E. Choukse, and M. Erez. 2016. Bit-plane Compression: Transforming Data for Better Compression in Many-core Architectures. In ISCA 2016.
[29]
S. Kottapalli and J. Baxter. 2009. Nehalem-EX CPU Architecture. In HotChips 2009.
[30]
P. Li, T. J. Hastie, and K. W. Church. 2006. Very Sparse Random Projections. In KDD 2006.
[31]
W. Liu, H. Wang, Y. Zhang, W. Wang, and L. Qin. 2019. I-LSH: I/O Efficient c-Approximate Nearest Neighbor Search in High-Dimensional Space. In ICDE 2019. 1670--1673.
[32]
J. MacQueen. 1967. Some methods for classification and analysis of multivariate observations. In Berkeley symposium on mathematical statistics and probability 1967.
[33]
N. Muralimanohar, R. Balasubramonian, and N. Jouppi. 2007. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0. In MICRO 2007.
[34]
T. M. Nguyen and D. Wentzlaff. 2015. MORC: A Manycore-oriented Compressed Cache. In MICRO 2015 (Waikiki, Hawaii). ACM, New York, NY, USA, 76--88.
[35]
J. Pan and D. Manocha. 2011. Fast GPU-based Locality Sensitive Hashing for K-nearest Neighbor Computation. In GIS 2011.
[36]
B. Panda and A. Seznec. 2016. Dictionary Sharing: An Efficient Cache Compression Scheme for Compressed Caches. In MICRO 2016.
[37]
G. Pekhimenko, T. Huberty, R. Cai, O. Mutlu, P. B. Gibbons, M. A. Kozuch, and T. C. Mowry. 2015. Exploiting compressed block size as an indicator of future reuse. In HPCA 2015.
[38]
G. Pekhimenko, V. Seshadri, O. Mutlu, M. A. Kozuch, P. B. Gibbons, and T. C. Mowry. 2012. Base-delta-immediate compression: Practical data compression for on-chip caches. In PACT 2012.
[39]
M. K. Qureshi, D. Thompson, and Y. N. Patt. 2005. The V-Way Cache: Demand Based Associativity via Global Replacement. In ISCA 2005.
[40]
J. San Miguel, J. Albericio, N. Enright Jerger, and A. Jaleel. 2016. The bunker cache for spatio-value approximation. In MICRO 2016.
[41]
J. San Miguel, J. Albericio, A. Moshovos, and N. Enright Jerger. 2015. Doppelg¨anger: a cache for approximate computing. In MICRO 2015.
[42]
D. Sanchez and C. Kozyrakis. 2013. ZSim: Fast and Accurate Microarchitectural Simulation of Thousand-core Systems. In ISCA 2013.
[43]
S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed Compressed Caches. In MICRO 2014.
[44]
S. Sardashti, A. Seznec, and D. A. Wood. 2014. Skewed Compressed Caches. In MICRO 2014.
[45]
S. Sardashti, A. Seznec, and D. A.Wood. 2016. Yet Another Compressed Cache: A Low-Cost Yet Effective Compressed Cache. ACM Trans. Archit. Code Optim. 13 (2016), 27:1--27:25.
[46]
S. Sardashti and D. A. Wood. 2013. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy-optimized Compressed Caching. In MICRO 2013.
[47]
S. Sardashti and D. A. Wood. 2014. Decoupled Compressed Cache: Exploiting Spatial Locality for Energy Optimization. IEEE Micro 34 (2014), 91--99.
[48]
T. Singh, S. Rangarajan, D. John, C. Henrion, S. Southard, H. McIntyre, A. Novak, S. Kosonocky, R. Jotwani, A. Schaefer, E. Chang, J. Bell, and M. Co. 2017. Zen: A next-generation high-performance x86 core. In ISSCC 2017.
[49]
G. Singh Manku, A. Jain, and A. Das Sarma. 2007. Detecting Nearduplicates for Web Crawling. In WWW 2007.
[50]
D. Skarlatos, N. S. Kim, and J. Torrellas. 2017. PageForge: A Near- Memory Content-Aware Page-Merging Architecture. In MICRO 2017. 302--314.
[51]
R. Spring and A. Shrivastava. 2017. Scalable and Sustainable Deep Learning via Randomized Hashing. In KDD 2017.
[52]
W. J. Starke. 2009. POWER7: IBM's Next Generation, Balanced POWER Server Chip. In HotChips 2009.
[53]
J. E. Stine, J. Chen, I. Castellanos, G. Sundararajan, M. Qayam, P. Kumar, J. Remington, and S. Sohoni. 2009. FreePDK v2.0: Transitioning VLSI education towards nanometer variation-aware designs. In MSE 2009.
[54]
J. Stuecheli and W. J. Starke. 2018. The IBM POWER9 Scale Up Processor. In HotChips 2018.
[55]
Y. Tian, S. M. Khan, D. A. Jim´enez, and G. H. Loh. 2014. Last-level Cache Deduplication. In ICS 2014 (Munich, Germany). 53--62.
[56]
P.-A. Tsai and D. Sanchez. 2019. Compress Objects, Not Cache Lines: An Object-Based Compressed Memory Hierarchy. In ASPLOS 2019.
[57]
P. R. Wilson, S. F. Kaplan, and Y. Smaragdakis. 1999. The Case for Compressed Caching in Virtual Memory Systems. In ATEC 1999.
[58]
J. Zheng and J. Luo. 2013. A PG-LSH Similarity Search Method for Cloud Storage. In CIS 2013.

Cited By

View all
  • (2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
  • (2024)A Comprehensive Evaluation of Packet Compression Techniques for NoC2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)10.1109/SPICES62143.2024.10779893(1-5)Online publication date: 20-Sep-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ASPLOS '20: Proceedings of the Twenty-Fifth International Conference on Architectural Support for Programming Languages and Operating Systems
March 2020
1412 pages
ISBN:9781450371025
DOI:10.1145/3373376
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 13 March 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cache compression
  2. dynamic clustering
  3. lsh
  4. memory hierarchy

Qualifiers

  • Research-article

Funding Sources

Conference

ASPLOS '20

Acceptance Rates

Overall Acceptance Rate 535 of 2,713 submissions, 20%

Upcoming Conference

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)86
  • Downloads (Last 6 weeks)4
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Low-Cost Fault-Tolerant Racetrack Cache Based on Data CompressionIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2024.337564071:8(3940-3944)Online publication date: Aug-2024
  • (2024)A Comprehensive Evaluation of Packet Compression Techniques for NoC2024 IEEE International Conference on Signal Processing, Informatics, Communication and Energy Systems (SPICES)10.1109/SPICES62143.2024.10779893(1-5)Online publication date: 20-Sep-2024
  • (2024)Enterprise-Class Cache Compression Design2024 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA57654.2024.00080(996-1011)Online publication date: 2-Mar-2024
  • (2023)Baryon: Efficient Hybrid Memory Management with Compression and Sub-Blocking2023 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA56546.2023.10071115(137-151)Online publication date: Feb-2023
  • (2022)Accelerating recommendation system training by leveraging popular choicesProceedings of the VLDB Endowment10.14778/3485450.348546215:1(127-140)Online publication date: 14-Jan-2022
  • (2022)FlitZip: Effective Packet Compression for NoC in MultiProcessor System-on-ChipIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2021.309031533:1(117-128)Online publication date: 1-Jan-2022
  • (2022)GBDI: Going Beyond Base-Delta-Immediate Compression with Global Bases2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00085(1115-1127)Online publication date: Apr-2022
  • (2022)Exploiting Inter-block Entropy to Enhance the Compressibility of Blocks with Diverse Data2022 IEEE International Symposium on High-Performance Computer Architecture (HPCA)10.1109/HPCA53966.2022.00084(1100-1114)Online publication date: Apr-2022
  • (2021)PDede: Partitioned, Deduplicated, Delta Branch Target BufferMICRO-54: 54th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3466752.3480046(779-791)Online publication date: 18-Oct-2021
  • (2021)Byte-Select CompressionACM Transactions on Architecture and Code Optimization10.1145/346220918:4(1-27)Online publication date: 3-Sep-2021
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media