research-article

Free access

AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations

Authors:

Jiwu ShuAuthors Info & Claims

PPoPP '25: Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Pages 142 - 155

https://doi.org/10.1145/3710848.3710856

Published: 28 February 2025 Publication History

PPoPP '25: Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations

Pages 142 - 155

Abstract
References

Abstract

In-memory key-value (KV) caching bridges the performance gap between high-performance networks and disk devices. However, prior in-memory KV caching systems either consider large objects or introduce additional memory overhead. In this paper, we conduct a systematic analysis over 56 production traces, and make three observations: (i) small objects dominate the traces and data accesses are highly skewed; (ii) the hotness of objects keeps stable across days; and (iii) the multi-get operation that retrieves multiple objects from the same node incurs much shorter tail latency than purely using the single-get operation.

These observations motivate the design of AC-Cache, an access-correlation-aware in-memory caching system for small objects. AC-Cache comprises three design primitives: (i) we formulate the distribution of KV objects as an integer linear programming problem to balance data accesses and memory consumption; (ii) we capture the access correlation in a memory-efficient means and generate fine-grained correlation groups; and (iii) we formulate the distribution of the correlation groups as a maximum flow problem to balance data accesses, and leverage a heuristic algorithm to dispatch other KV objects to balance memory consumption. Extensive experiments with billions of objects on Alibaba Cloud show that AC-Cache can reduce the tail latency by 5.1-80.2% and increase the access throughput by 42.8-534.8%.

References

[1]

Inc. Alluxio. 2024. Alluxio. https://www.alluxio.io/.

[2]

Ganesh Ananthanarayanan, Sameer Agarwal, Srikanth Kandula, Albert Greenberg, Ion Stoica, Duke Harlan, and Ed Harris. 2011. Scarlett: Coping with Skewed Content Popularity in Mapreduce Clusters. In Proc. of the 6th Conference on Computer Systems (EuroSys '11). 287--300.

Digital Library

[3]

Chris Aniszczyk. 2012. Caching with twemcache. Twitter Blog, Engineering Blog (2012), 1--7.

[4]

Caching at reddit. 2017. https://redditblog.com/2017/1/17/caching-at-reddit/.

[5]

Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload Analysis of a Large-Scale Key-Value Store. In Proc. of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems (SIGMETRICS '12). 53--64.

Digital Library

[6]

Philip A. Bernstein, Vassos Hadzilacos, and Nathan Goodman. 1987. Concurrency Control and Recovery in Database Systems. Addison-Wesley.

Digital Library

[7]

Vincent D. Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne Lefebvre. 2008. Fast Unfolding of Communities in Large Networks. Journal of Statistical Mechanics: Theory and Experiment 2008, 10 (Oct. 2008), P10008.

[8]

CacheLib. 2024. Evaluating SSD hardware for Facebook workloads. https://cachelib.org/docs/Cache_Library_User_Guides/Cachebench_FB_HW_eval.

[9]

Brad Calder, Chandra Krintz, Simmi John, and Todd Austin. 1998. Cache-conscious data placement. In Proceedings of the Eighth International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS VIII). 139--149.

Digital Library

[10]

Zhichao Cao, Siying Dong, Sagar Vemuri, and David H. C. Du. 2020. Characterizing, Modeling, and Benchmarking {RocksDB} Key-Value Workloads at Facebook. In Proc. of the 18th USENIX Conference on File and Storage Technologies (FAST 20). 209--223.

[11]

Badrish Chandramouli, Guna Prasaad, Donald Kossmann, Justin Levandoski, James Hunter, and Mike Barnett. 2018. FASTER: A Concurrent Key-Value Store with In-Place Updates. In Proc. of the 2018 International Conference on Management of Data (SIGMOD '18). 275--290.

Digital Library

[12]

Youxu Chen, Cheng Li, Min Lv, Xinyang Shao, Yongkun Li, and Yinlong Xu. 2019. Explicit Data Correlations-Directed Metadata Prefetching Method in Distributed File Systems. IEEE Transactions on Parallel and Distributed Systems 30, 12 (2019), 2692--2705.

[13]

Liangfeng Cheng, Yuchong Hu, Zhaokang Ke, Jia Xu, Qiaori Yao, Dan Feng, Weichun Wang, and Wei Chen. 2021. LogECMem: Coupling Erasure-Coded In-Memory Key-Value Stores with Parity Logging. In Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis (SC '21). 1--15.

Digital Library

[14]

Liangfeng Cheng, Yuchong Hu, and Patrick P. C. Lee. 2019. Coupling Decentralized Key-Value Stores with Erasure Coding. In Proc. of the ACM Symposium on Cloud Computing (SoCC '19). 377--389.

[15]

Yue Cheng, Aayush Gupta, and Ali R. Butt. 2015. An In-Memory Object Caching Framework with Adaptive Load Balancing. In Proceedings of the Tenth European Conference on Computer Systems (EuroSys '15). 1--16.

[16]

Asaf Cidon, Daniel Rushton, Stephen M. Rumble, and Ryan Stutsman. 2017. Memshare: A Dynamic Multi-tenant Key-value Cache. In Proc. of the 2017 USENIX Annual Technical Conference (USENIX ATC 17). 321--334.

[17]

Alibaba Cloud. 2023. Alibaba Cloud: Cloud Computing Services. https://www.alibabacloud.com.

[18]

Alibaba Cloud. 2024. Tair. https://github.com/alibaba/tair.

[19]

Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing (SoCC '10). 143----154.

Digital Library

[20]

Graham Cormode and S. Muthukrishnan. 2004. What's New: Finding Significant Fifferences in Network Data Streams. In IEEE INFOCOM 2004 - 23rd Annual Joint Conference on Computer Communications. 1534--1545.

[21]

Graham Cormode and S. Muthukrishnan. 2005. An Improved Data Stream Summary: The Count-Min Sketch and Its Applications. Journal of Algorithms 55, 1 (April 2005), 58--75.

Digital Library

[22]

Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. Proceedings of the VLDB Endowment 3, 1--2 (2010), 48--57.

Digital Library

[23]

Diego Didona and Willy Zwaenepoel. 2019. Size-Aware Sharding For Improving Tail Latencies in In-Memory Key-Value Stores. In Proc. of the 16th USENIX Symposium on Networked Systems Design and Implementation (NSDI 19) (2019). 79--94.

[24]

Yefim Dinitz. 2006. Dinitz' Algorithm: The Original Version and Even's Version. In Theoretical Computer Science: Essays in Memory of Shimon Even. 218--240.

[25]

EVCache. 2023. https://github.com/Netflix/EVCache.

[26]

Brad Fitzpatrick. 2022. Memcached - a Distributed Memory Object Caching System. https://www.memcached.org/.

[27]

M. Girvan and M. E. J. Newman. 2002. Community Structure in Social and Biological Networks. in Proc. of the National Academy of Sciences of the United States of America 99, 12 (June 2002), 7821--7826.

[28]

Nikolas Gloy and Michael D. Smith. 1999. Procedure placement using temporal-ordering information. ACM Trans. Program. Lang. Syst. 21, 5 (Sept. 1999), 977--1027.

Digital Library

[29]

Yu-Ju Hong and Mithuna Thottethodi. 2013. Understanding and Mitigating the Impact of Load Imbalance in the Memory Caching Tier. In Proc. of the 4th Annual Symposium on Cloud Computing (SOCC '13).

Digital Library

[30]

Cheng Huang, Huseyin Simitci, Yikang Xu, Aaron Ogus, Brad Calder, Parikshit Gopalan, Jin Li, and Sergey Yekhanin. 2012. Erasure Coding in Windows Azure Storage. In Proc. of USENIX Annual Technical Conference (USENIX ATC 12). 15--26.

[31]

Qi Huang, Helga Gudmundsdottir, Ymir Vigfusson, Daniel A. Freedman, Ken Birman, and Robbert van Renesse. 2014. Characterizing Load Imbalance in Real-World Networked Caches. In Proceedings of the 13th ACM Workshop on Hot Topics in Networks (HotNets-XIII). 1--7.

Digital Library

[32]

Qun Huang, Xin Jin, Patrick P. C. Lee, Runhui Li, Lu Tang, Yi-Chao Chen, and Gong Zhang. 2017. SketchVisor: Robust Network Measurement for Software Packet Processing. In Proc. of the Conference of the ACM Special Interest Group on Data Communication (SIGCOMM '17). 113--126.

Digital Library

[33]

Bert Hubert, Jacco Geul, and Simon Séhier. 2023. Wonder-Shaper: Command-line utility for limiting an adapter's bandwidth. https://github.com/magnific0/wondershaper.

[34]

IBM. 2022. IBM ILOG CPLEX Optimization Studio 22.1.1 documentation. https://www.ibm.com/docs/en/icos/22.1.1.

[35]

Aerospike Inc. 2024. Aerospike. https://aerospike.com/.

[36]

Song Jiang, Xiaoning Ding, Yuehai Xu, and Kei Davis. 2013. A Prefetching Scheme Exploiting Both Data Layout and Access History on Disk. ACM Transactions on Storage 9, 3 (2013), 10:1--10:23.

Digital Library

[37]

Anuj Kalia, Michael Kaminsky, and David G. Andersen. 2016. Design Guidelines for High Performance {RDMA} Systems. In 2016 USENIX Annual Technical Conference (USENIX ATC 16). 437--450.

[38]

Bisma S. Khan and Muaz A. Niazi. 2017. Network Community Detection: A Review and Visual Survey.

[39]

Lamport Leslie. 1998. The part-time parliament. ACM Trans. on Computer Systems 16 (1998), 133--169.

Digital Library

[40]

Huiba Li, Yiming Zhang, Zhiming Zhang, Shengyun Liu, Dongsheng Li, Xiaohui Liu, and Yuxing Peng. 2017. PARIX: Speculative Partial Writes in Erasure-Coded Systems. In Proc. of USENIX Annual Technical Conference (USENIX ATC 17). 581--587.

[41]

Jun Li, Xiaofei Xu, Zhigang Cai, Jianwei Liao, Kenli Li, Balazs Gerofi, and Yutaka Ishikawa. 2022. Pattern-Based Prefetching with Adaptive Cache Management Inside of Solid-State Drives. ACM Transactions on Storage 18, 1 (2022), 7:1--7:25.

Digital Library

[42]

Zhenmin Li, Zhifeng Chen, Sudarshan M. Srinivasan, and Yuanyuan Zhou. 2004. C-Miner: Mining Block Correlations in Storage. In Proc. of the 3rd USENIX Conference on File and Storage Technologies (FAST 04). 173--186.

[43]

Hyeontaek Lim, Dongsu Han, David G. Andersen, and Michael Kaminsky. 2014. MICA: A Holistic Approach to Fast In-Memory Key-Value Storage. In Proc. of the 11th USENIX Symposium on Networked Systems Design and Implementation (NSDI 14) (2014). 429--444.

[44]

Redis Ltd. 2022. Redis. https://redis.io/.

[45]

Sara McAllister, Benjamin Berg, Julian Tutuncu-Macias, Juncheng Yang, Sathya Gunasekar, Jimmy Lu, Daniel S. Berger, Nathan Beckmann, and Gregory R. Ganger. 2021. Kangaroo: Caching Billions of Tiny Objects on Flash. In Proc. of the ACM SIGOPS 28th Symposium on Operating Systems Principles (SOSP 21). 243--262.

[46]

Rajesh Nishtala, Hans Fugal, Steven Grimm, Marc Kwiatkowski, Herman Lee, Harry C. Li, Ryan McElroy, Mike Paleczny, Daniel Peek, Paul Saab, David Stafford, Tony Tung, and Venkateshwaran Venkataramani. 2013. Scaling Memcache at Facebook. In Proc. of 10th USENIX Symposium on Networked Systems Design and Implementation (NSDI 13). 385--398.

[47]

Diego Ongaro and John Ousterhout. 2014. In Search of an Understandable Consensus Algorithm. In 2014 USENIX Annual Technical Conference (USENIX ATC 14). 305--319.

Digital Library

[48]

Karl Pettis and Robert C. Hansen. 1990. Profile guided code positioning. In Proceedings of the ACM SIGPLAN 1990 Conference on Programming Language Design and Implementation (PLDI '90). 16----27.

[49]

Rodric M. Rabbah and Krishna V. Palem. 2003. Data remapping for design space optimization of embedded memory systems. ACM Trans. Embed. Comput. Syst. 2, 2 (2003), 186--218.

Digital Library

[50]

K. V. Rashmi, Mosharaf Chowdhury, Jack Kosaian, Ion Stoica, and Kannan Ramchandran. 2016. EC-Cache: Load-Balanced, Low-Latency Cluster Caching with Online Erasure Coding. In Proc. of the 12th USENIX Symposium on Operating Systems Design and Implementation (OSDI 16). 401--417.

[51]

Robert Schweller, Zhichun Li, Yan Chen, Yan Gao, Ashish Gupta, Yin Zhang, Peter A. Dinda, Ming-Yang Kao, and Gokhan Memik. 2007. Reversible Sketches: Enabling Monitoring and Analysis Over High-Speed Data Streams. IEEE/ACM Transactions on Networking 15, 5 (Oct. 2007), 1059--1072.

Digital Library

[52]

Zhirong Shen, Patrick P. C. Lee, Jiwu Shu, and Wenzhong Guo. 2019. Correlation-Aware Stripe Organization for Efficient Writes in Erasure-Coded Storage: Algorithms and Evaluation. IEEE Transactions on Parallel and Distributed Systems 30, 7 (2019), 1552--1564.

[53]

SoftwareAG. 2024. EHCache - Java's most widely used cache. https://www.ehcache.org/.

[54]

Cha Hwan Song, Xin Zhe Khooi, Raj Joshi, Inho Choi, Jialin Li, and Mun Choon Chan. 2023. Network Load Balancing with In-network Reordering Support for RDMA. In Proceedings of the ACM SIGCOMM 2023 Conference (ACM SIGCOMM '23). 816--831.

Digital Library

[55]

Gokul Soundararajan, Madalin Mihailescu, and Cristiana Amza. 2008. Context-Aware Prefetching at the Storage Server. In Proc. of the 2008 USENIX Annual Technical Conference (USENIX ATC 08). 377--390.

Digital Library

[56]

Yasodha Suriyakumar, Nathan R Tallent, Andres Marquez, Karen L Karavanic, and Ozgur O Kilic. 2024. MemFriend: Understanding Memory Performance with Spatial-Temporal Affinity. In Proceedings of the International Symposium on Memory Systems (MEMSYS '24). 270-- --284.

Digital Library

[57]

Robert Endre Tarjan. 1983. Data structures and network algorithms. SIAM.

[58]

Twitter. 2023. Pelikan Cache. https://pelikan.io/.

[59]

Xingda Wei, Rong Chen, and Haibo Chen. 2020. Fast {RDMA-based} Ordered {Key-Value} Store Using Remote Learned Cache. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 117--135.

[60]

Xingda Wei, Zhiyuan Dong, Rong Chen, and Haibo Chen. 2018. Deconstructing {RDMA-enabled} Distributed Transactions: Hybrid Is Better!. In 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI 18). 233--251.

[61]

Juncheng Yang, Yao Yue, and K. V. Rashmi. 2020. A Large Scale Analysis of Hundreds of In-Memory Cache Clusters at Twitter. In Proc. of the 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 191--208.

[62]

Juncheng Yang, Yao Yue, and Rashmi Vinayak. 2021. Segcache: A Memory-Efficient and Scalable In-Memory Key-Value Cache for Small Objects. In 18th USENIX Symposium on Networked Systems Design and Implementation (NSDI 21). 503--518.

[63]

Matt M. T. Yiu, Helen H. W. Chan, and Patrick P. C. Lee. 2017. Erasure Coding for Small Objects in In-Memory KV Storage. In Proc. of the 10th ACM International Systems and Storage Conference (SYSTOR 17). 1--12.

[64]

Minchen Yu, Yinghao Yu, Yunchuan Zheng, Baichen Yang, and Wei Wang. 2020. RepBun: Load-Balanced, Shuffle-Free Cluster Caching for Structured Data. In IEEE INFOCOM 2020 - IEEE Conference on Computer Communications. 954--963.

[65]

Yinghao Yu, Renfei Huang, Wei Wang, Jun Zhang, and Khaled Ben Letaief. 2018. SP-Cache: Load-Balanced, Redundancy-Free Cluster Caching with Selective Partition. In Proc. of International Conference for High Performance Computing, Networking, Storage and Analysis (SC 18). 1--13.

Digital Library

[66]

Chengliang Zhang, Chen Ding, Mitsunori Ogihara, Yutao Zhong, and Youfeng Wu. 2006. A hierarchical model of data locality. In Conference Record of the 33rd ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages (POPL '06). 16--29.

Digital Library

[67]

Chengliang Zhang and Martin Hirzel. 2008. Online Phase-Adaptive Data Layout Selection. In ECOOP 2008 - Object-Oriented Programming, Jan Vitek (Ed.). 309--334.

[68]

Mi Zhang, Qiuping Wang, Zhirong Shen, and Patrick P. C. Lee. 2019. Parity-Only Caching for Robust Straggler Tolerance. In Proc. of the 35th Symposium on Mass Storage Systems and Technologies (MSST 19). 257--268.

[69]

Yutao Zhong, Maksim Orlovich, Xipeng Shen, and Chen Ding. 2004. Array regrouping and structure splitting using whole-program reference affinity. SIGPLAN Not. 39, 6 (June 2004), 255--266.

Digital Library

Index Terms

AC-Cache: A Memory-Efficient Caching System for Small Objects via Exploiting Access Correlations
1. Information systems
  1. Information storage systems
    1. Storage architectures
      1. Distributed storage

Recommendations

Exposing non-volatile memory cache for adaptive storage access
SAC '15: Proceedings of the 30th Annual ACM Symposium on Applied Computing

This paper proposes a method that combines next generation non-volatile (NV) memory technologies to block storage and makes use of NV memory as storage cache. The existing method to combine cache storage with block storage hides the cache storage under ...
Hardware/software cooperative caching for hybrid DRAM/NVM memory architectures
ICS '17: Proceedings of the International Conference on Supercomputing

Non-Volatile Memory (NVM) has recently emerged for its nonvolatility, high density and energy efficiency. Hybrid memory systems composed of DRAM and NVM have the best of both worlds, because NVM can offer larger capacity and have near-zero standby power ...
Kangaroo: Theory and Practice of Caching Billions of Tiny Objects on Flash
Many social-media and IoT services have very large working sets consisting of billions of tiny (≈100 B) objects. Large, flash-based caches are important to serving these working sets at acceptable monetary cost. However, caching tiny objects on flash is ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

PPoPP '25: Proceedings of the 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

February 2025

580 pages

ISBN:9798400714436

DOI:10.1145/3710848

Copyright © 2025 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 28 February 2025

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Natural Science Foundation of Fujian Province of China
Natural Science Foundation of China
Major Research Plan of the National Natural Science Foundation of China
National Key Research and Development Program of China
Xiaomi Young Scholar

Conference

PPoPP '25

Sponsor:

PPoPP '25: The 30th ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

March 1 - 5, 2025

NV, Las Vegas, USA

Acceptance Rates

Overall Acceptance Rate 230 of 1,014 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
249
Total Downloads

Downloads (Last 12 months)249
Downloads (Last 6 weeks)249

Reflects downloads up to 07 Mar 2025

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten