research-article

Free access

Desperately seeking ... optimal multi-tier cache configurations

AUTHORs:

Pranav Bhandari,

Erez ZadokAuthors Info & Claims

HotStorage'20: Proceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems

Article No.: 6, Page 6

Published: 13 July 2020 Publication History

PDF eReader Publisher Site

Abstract

Modern cache hierarchies are tangled webs of complexity. Multiple tiers of heterogeneous physical and virtual devices, with many configurable parameters, all contend to optimally serve swarms of requests between local and remote applications. The challenge of effectively designing these systems is exacerbated by continuous advances in hardware, firmware, innovation in cache eviction algorithms, and evolving workloads and access patterns. This rapidly expanding configuration space has made it costly and time-consuming to physically experiment with numerous cache configurations for even a single stable workload. Current cache evaluation techniques (e.g., Miss Ratio Curves) are short-sighted: they analyze only a single tier of cache, focus primarily on performance, and fail to examine the critical relationships between metrics like throughput and monetary cost. Publicly available I/O cache simulators are also lacking: they can only simulate a fixed or limited number of cache tiers, are missing key features, or offer limited analyses.

It is our position that best practices in cache analysis should include the evaluation of multi-tier configurations, coupled with more comprehensive metrics that reveal critical design trade-offs, especially monetary costs. We are developing an n-level I/O cache simulator that is general enough to model any cache hierarchy, captures many metrics, provides a robust set of analysis features, and is easily extendable to facilitate experimental research or production level provisioning. To demonstrate the value of our proposed metrics and simulator, we extended an existing cache simulator (PyMimircache). We present several interesting and counter-intuitive results in this paper.

References

[1]

Accusim: Accurate simulation of cache replacement algorithms, March 2020. https://engineering.purdue.edu/~ychu/accusim/.

[2]

Waleed Ali, Sarina Sulaiman, and Norbahiah Ahmad. Performance improvement of least-recently-used policy in web proxy cache replacement using supervised machine learning. In SOCO, 2014.

[3]

Anandtech: Hardware news and tech reviews since 1997. www.anandtech.com.

[4]

Dulcardo Arteaga, Jorge Cabrera-Gámez, Jing Xu, Swaminathan Sundararaman, and Ming Zhao. Cloudcache: On-demand flash cache management for cloud computing. In FAST, 2016.

[5]

Daniel S. Berger, Benjamin Berg, Timothy Zhu, Siddhartha Sen, and Mor Harchol-Balter. Robinhood: Tail latency aware caching - dynamic reallocation from cache-rich to cache-poor. In OSDI, 2018.

[6]

Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood. The gem5 simulator. SIGARCH Computer Architecture News, 39(2):1-7, August 2011.

Digital Library

[7]

Daniel Byrne, Nilufer Onder, and Zhenlin Wang. mpart: Miss-ratio curve guided partitioning in key-value stores. In ISMM, 2018.

Digital Library

[8]

Kevin K. Chang, Abhijith Kashyap, Hasan Hassan, Saugata Ghose, Kevin Hsieh, Donghyuk Lee, Tianshi Li, Gennady Pekhimenko, Samira Khan, and Onur Mutlu. Understanding latency variation in modern DRAMchips: Experimental characterization, analysis, and optimization. In Proceedings of the 2016 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Science, SIGMETRICS'16, pages 323-336, New York, NY, USA, 2016. ACM.

Digital Library

[9]

X. Chen, N. Khoshavi, J. Zhou, D. Huang, R. F. DeMara, J. Wang, W. Wen, and Y. Chen. Aos: Adaptive overwrite scheme for energy-efficient mlc stt-ram cache. In 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC), pages 1-6, June 2016.

[10]

Xian Chen, Wenzhi Chen, Zhongyong Lu, Peng Long, Shuiqiao Yang, and Zonghiu Wang. A duplication-aware SSD-based cache architecture for primary storage in virtualization environment. IEEE Systems Journal, 11(4):2578-2589, December 2017.

[11]

Zhiguang Chen, NongXiao, and Fang Liu. Sac: Rethinking the cache replacement policy for ssd-based storage systems. In Proceedings of the 5th Annual International Systems and Storage Conference, SYSTOR '12, New York, NY, USA, 2012. Association for Computing Machinery.

Digital Library

[12]

Yue Cheng, Aayush Gupta, Anna Povzner, and Ali R. Butt. High performance in-memory caching through flexible fine-grained services. In Proceedings of the 4th Annual Symposium on Cloud Computing, SOCC '13, New York, NY, USA, 2013. Association for Computing Machinery.

Digital Library

[13]

Yuxia Cheng, Wenzhi Chen, Zonghui Wang, Xinjie Yu, and Yang Xiang. AMC: an adaptive multi-level cache algorithm in hybrid storage systems. Concurrency and Computation: Practice and Experience, 27(16):4230-4246, 2015.

[14]

Yuxia Cheng, Yang Xiang, Wenzhi Chen, Houcine Hassan, and Abdulhameed Alelaiwi. Efficient cache resource aggregation using adaptive multi-level exclusive caching policies. Future Generation Computer Systems, 86:964 - 974, 2018.

[15]

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. Dynacache: Dynamic cloud caching. In 7th USENIX Workshop on Hot Topics in Cloud Computing (HotCloud 15), Santa Clara, CA, July 2015. USENIX Association.

Digital Library

[16]

Asaf Cidon, Assaf Eisenman, Mohammad Alizadeh, and Sachin Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379-392, Santa Clara, CA, March 2016. USENIX Association.

[17]

Jeffrey Dean and Luiz André Barroso. The tail at scale. Communications of the ACM, 56(2):74-80, February 2013.

Digital Library

[18]

Dinero iv trace-driven uniprocessor cache simulator. http://pages.cs.wisc.edu/~markhill/DineroIV/.

[19]

Nosayba El-Sayed, Ioan A. Stefanovici, George Amvrosiadis, Andy A. Hwang, and Bianca Schroeder. Temperature management in data centers: Why some (might) like it hot. In Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE Joint International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS'12, pages 163-174, New York, NY, USA, 2012. ACM.

Digital Library

[20]

Jianyu Fu, Dulcardo Arteaga, and Ming Zhao. Locality-driven mrc construction and cache allocation. In Proceedings of the 27th International Symposium on High-Performance Parallel and Distributed Computing, HPDC '18, pages 19-20, New York, NY, USA, 2018. ACM.

Digital Library

[21]

U. U. Hafeez, M. Wajahat, and A. Gandhi. ElMem: Towards an Elastic Memcached System. In Proceedings of the 38th IEEE International Conference on Distributed Computing Systems, pages 278-289, Vienna, Austria, 2018.

[22]

U. U. Hafeez, M. Wajahat, and A. Gandhi. Elmem: Towards an elastic memcached system. In 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS), pages 278-289, 2018.

[23]

Alireza Haghdoost. Sim-ideal, Dec 2013. https://github.com/arh/sim-ideal/tree/master.

[24]

Md E. Haque, Yong hun Eom, Yuxiong He, Sameh Elnikety, Ricardo Bianchini, and Kathryn S. McKinley. Few-to-many: Incremental parallelism for reducing tail latency in interactive services. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS'15, pages 161-175, New York, NY, USA, 2015. ACM.

Digital Library

[25]

Lulu He, Zhibin Yu, and Hai Jin. Fractalmrc: Online cache miss rate curve prediction on commodity systems. 2012 IEEE 26th International Parallel and Distributed Processing Symposium, pages 1341-1351, 2012.

Digital Library

[26]

Xiameng Hu, Xiaolin Wang, Lan Zhou, Yingwei Luo, Zhenlin Wang, Chen Ding, and Chencheng Ye. Fast miss ratio curve modeling for storage cache. TOS, 14:12:1-12:34, 2018.

Digital Library

[27]

Dr. Shaily Jain and Nitin Nitin. Memory map: A multiprocessor cache simulator. Journal of Electrical and Computer Engineering, 2012, 09 2012.

[28]

Myeongjae Jeon, Saehoon Kim, Seung-won Hwang, Yuxiong He, Sameh Elnikety, Alan L. Cox, and Scott Rixner. Predictive parallelization: Taming tail latencies in web search. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, SIGIR'14, pages 253-262, New York, NY, USA, 2014. ACM.

Digital Library

[29]

N. Jeremic, G.M'uhl, A. Busse, and J. Richling. The pitfalls of deploying solid-state drive RAIDs. In Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR '11. ACM, 2011.

Digital Library

[30]

M. Jung and M. Kandemir. Revisiting widely held SSD expectations and rethinking system-level implications. In Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '13, pages 203-216, New York, NY, USA, 2013. ACM.

Digital Library

[31]

Ricardo Koller, Akshat Verma, and Raju Rangaswami. Generalized erss tree model: Revisiting working sets. Performance Evaluation, 67:1139-1154, 2010.

Digital Library

[32]

Jialin Li, Naveen Kr. Sharma, Dan R. K. Ports, and Steven D. Gribble. Tales of the tail: Hardware, OS, and application-level sources of tail latency. In Proceedings of the ACM Symposium on Cloud Computing, SoCC'14, pages 9:1-9:14, New York, NY, USA, 2014. ACM.

Digital Library

[33]

Z. Li, M. Chen, A. Mukker, and E. Zadok. On the trade-offs among performance, energy, and endurance in a versatile hybrid drive. ACM Transactions on Storage (TOS), 11(3), July 2015.

[34]

Z. Li, M. Chen, and E. Zadok. Greendm: A versatile hybrid drive for energy and performance. Technical report, Stony Brook University, 2013. Paper under review.

[35]

Z. Li, A. Mukker, and E. Zadok. On the importance of evaluating storage systems' $costs. In Proceedings of the 6th USENIX Conference on Hot Topics in Storage and File Systems, HotStorage'14, 2014.

Digital Library

[36]

Chieh-Jan Mike Liang, Jie Liu, Liqian Luo, Andreas Terzis, and Feng Zhao. RACNet: A high-fidelity data center sensing network. In Proceedings of the 7th ACM Conference on Embedded Networked Sensor Systems, SenSys'09, pages 15-28, New York, NY, USA, 2009. ACM.

Digital Library

[37]

Y. Lu, J. Shu, and W. Zheng. Extending the lifetime of flash-based storage through reducing write amplification from file systems. In In Proceedings of the 11th USENIX Symposium on File and Storage Technologies (FAST '13), 2013.

Digital Library

[38]

Rano Mal and Yul Chu. A flexible multi-core functional cache simulator (fm-sim). In Proceedings of the Summer Simulation Multi-Conference, SummerSim '17, San Diego, CA, USA, 2017. Society for Computer Simulation International.

Digital Library

[39]

Michael Mesnier, Feng Chen, Tian Luo, and Jason B. Akers. Differentiated storage services. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 57-70, New York, NY, USA, 2011. ACM.

Digital Library

[40]

D. Narayanan, A. Donnelly, and A. Rowstron. Write off-loading: Practical power management for enterprise storage. In Proceedings of the 6th USENIX Conference on File and Storage Technologies (FAST 2008), 2008.

Digital Library

[41]

Iyswarya Narayanan, DiWang, Myeongjae Jeon, Bikash Sharma, Laura Caulfield, Anand Sivasubramaniam, Ben Cutler, Jie Liu, Badriddine Khessib, and Kushagra Vaid. SSD failures in datacenters: What? when? and why? In Proceedings of the Ninth ACM Israeli Experimental Systems Conference (SYSTOR '16), pages 7:1-7:11, Haifa, Israel, May 2016. ACM.

Digital Library

[42]

A. V. Nori, J. Gaur, S. Rai, S. Subramoney, and H. Wang. Criticality aware tiered cache hierarchy: A fundamental relook at multi-level cache hierarchies. In 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), pages 96-109, June 2018.

Digital Library

[43]

Massachusetts Institute of Technology. Dynamorio: Dynamic instrumentation tool platform, February 2009. http://www.dynamorio.org/.

[44]

Sundaresan Rajasekaran, Shaohua Duan, Wei Zhang, and Timothy Wood. Multi-cache: Dynamic, efficient partitioning for multi-tier caches in consolidated VM environments. In 2016 IEEE International Conference on Cloud Engineering (IC2E), pages 182-191, April 2016.

[45]

R. Salkhordeh, S. Ebrahimi, and H. Asadi. Reca: An efficient reconfigurable cache architecture for storage systems with online workload characterization. IEEE Transactions on Parallel and Distributed Systems, 29(7):1605-1620, July 2018.

[46]

Ricardo Santana, Steven Lyons, Ricardo Koller, Raju Rangaswami, and Jason Liu. To arc or not to arc. In HotStorage, 2015.

[47]

Priya Sehgal, Vasily Tarasov, and Erez Zadok. Evaluating performance and energy in file system server workloads. In Proceedings of the USENIX Conference on File and Storage Technologies (FAST), pages 253-266, San Jose, CA, February 2010. USENIX Association.

Digital Library

[48]

Carl Staelin and Hector Garcia-molina. Clustering active disk data to improve disk performance. Technical Report CS-TR-298-9, Princeton University, NJ, USA, 1990.

[49]

Lalith Suresh, Marco Canini, Stefan Schmid, and Anja Feldmann. C3: Cutting tail latency in cloud data stores via adaptive replica selection. In Proceedings of the 12th USENIX Conference on Networked Systems Design and Implementation, NSDI'15, pages 513-527, Berkeley, CA, USA, 2015. USENIX Association.

Digital Library

[50]

Tom's hardware: For the hardcore pc enthusiast. www.tomshardware.com.

[51]

Userbenchmark. www.userbenchmark.com.

[52]

A. Verma, R. Koller, L. Useche, and R. Rangaswami. SRCMap: Energy proportional storage using dynamic consolidation. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST'10, 2010.

Digital Library

[53]

Giuseppe Vietri, Liana V. Rodriguez, Wendy A. Martinez, Steven Lyons, Jason Liu, Raju Rangaswami, Ming Zhao, and Giri Narasimhan. Driving cache replacement with ml-based lecar. In HotStorage, 2018.

[54]

Carl A. Waldspurger, Nohhyun Park, Alex Garthwaite, and Irfan Ahmad. Efficient mrc construction with shards. In FAST, 2015.

[55]

Carl A. Waldspurger, Trausti Saemundson, Irfan Ahmad, and Nohhyun Park. Cache modeling and optimization using miniature simulations. In Proceedings of the 2017 USENIX Conference on Usenix Annual Technical Conference, USENIX ATC '17, pages 487-498, Berkeley, CA, USA, 2017. USENIX Association.

Digital Library

[56]

Han Wan, Xiaopeng Gao, Xiang Long, and Zhiqiang Wang. Gcsim: A gpu-based trace-driven simulator for multi-level cache. In Yong Dou, Ralf Gruber, and Josef M. Joller, editors, Advanced Parallel Processing Technologies, pages 177-190, Berlin, Heidelberg, 2009. Springer Berlin Heidelberg.

Digital Library

[57]

Jiangtao Wang, Zhiliang Guo, and Xiaofeng Meng. An efficient design and implementation of multi-level cache for database systems. In DASFAA, 2015.

[58]

A. Wildani, E. L. Miller, and L. Ward. Efficiently identifying working sets in block I/O streams. In Proceedings of the 4th Annual International Conference on Systems and Storage, SYSTOR '11, pages 5:1-5:12. ACM, 2011.

Digital Library

[59]

John Wilkes. The pantheon storage-system simulator. 1996.

[60]

Suzhen Wu, Yanping Lin, Bo Mao, and Hong Jiang. Gcar: Garbage collection aware cache management with improved performance for flash-based ssds. In Proceedings of the 2016 International Conference on Supercomputing, ICS '16, New York, NY, USA, 2016. Association for Computing Machinery.

Digital Library

[61]

Yunjing Xu, Zachary Musgrave, Brian Noble, and Michael Bailey. Bobtail: Avoiding long tails in the cloud. In Proceedings of the 10th USENIX Conference on Networked Systems Design and Implementation, NSDI'13, pages 329-342, Berkeley, CA, USA, 2013. USENIX Association.

Digital Library

[62]

Juncheng Yang. PyMimircache. https://github.com/1a1a11a/PyMimircache. Retrieved April 17, 2019.

[63]

Juncheng Yang, Reza Karimi, Trausti Sæmundsson, Avani Wildani, and Ymir Vigfusson. MITHRIL: mining sporadic associations for cache prefetching. CoRR, abs/1705.07400, 2017.

Digital Library

[64]

Juncheng Yang, Reza Karimi, Trausti Sæmundsson, Avani Wildani, and Ymir Vigfusson. Mithril: Mining sporadic associations for cache prefetching. In Proceedings of the 2017 Symposium on Cloud Computing, SoCC '17, pages 66-79, New York, NY, USA, 2017. ACM.

Digital Library

[65]

Lei Zhang, Reza Karimi, Irfan Ahmad, and Ymir Vigfusson. Optimal data placement for heterogeneous cache, memory, and storage systems. In Proceedings of the ACM SIGMETRICS/International Conference on Measurement and Modeling of Computer Systems, SIGMETRICS '20, 2020. To appear.

Digital Library

[66]

Timothy Zhu, Anshul Gandhi, Mor Harchol-Balter, and Michael A. Kozuch. Saving cash by using less cache. In Proceedings of the 4th USENIX Conference on Hot Topics in Cloud Computing, HotCloud'12, page 3, USA, 2012. USENIX Association.

Digital Library

Index Terms

Desperately seeking ... optimal multi-tier cache configurations

Index terms have been assigned to the content through auto-classification.

Recommendations

Characteristics of performance-optimal multi-level cache hierarchies
Special Issue: Proceedings of the 16th annual international symposium on Computer Architecture

The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both ...
Characteristics of performance-optimal multi-level cache hierarchies
ISCA '89: Proceedings of the 16th annual international symposium on Computer architecture

The increasing speed of new generation processors will exacerbate the already large difference between CPU cycle times and main memory access times. As this difference grows, it will be increasingly difficult to build single-level caches that are both ...
Location cache: a low-power L2 cache system
ISLPED '04: Proceedings of the 2004 international symposium on Low power electronics and design

While set-associative caches incur fewer misses than direct-mapped caches, they typically have slower hit times and higher power consumption, when multiple tag and data banks are probed in parallel. This paper presents the location cache structure which ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

HotStorage '20: Proceedings of the 12th USENIX Conference on Hot Topics in Storage and File Systems

July 2020

12 pages

Program Chairss:
Anirudh Badam
Microsoft
,
Vijay Chidambaram
The University of Texas at Austin and VMware Research

Copyright © 2020.

Sponsors

ORACLE
VMware

Publisher

USENIX Association

United States

Publication History

Published: 13 July 2020

Qualifiers

Research-article

Acceptance Rates

Overall Acceptance Rate 34 of 87 submissions, 39%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
50
Total Downloads

Downloads (Last 12 months)28
Downloads (Last 6 weeks)5

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents