Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2254756.2254766acmconferencesArticle/Chapter ViewAbstractPublication PagesmetricsConference Proceedingsconference-collections
research-article

Workload analysis of a large-scale key-value store

Published: 11 June 2012 Publication History

Abstract

Key-value stores are a vital component in many scale-out enterprises, including social networks, online retail, and risk analysis. Accordingly, they are receiving increased attention from the research community in an effort to improve their performance, scalability, reliability, cost, and power consumption. To be effective, such efforts require a detailed understanding of realistic key-value workloads. And yet little is known about these workloads outside of the companies that operate them. This paper aims to address this gap.
To this end, we have collected detailed traces from Facebook's Memcached deployment, arguably the world's largest. The traces capture over 284 billion requests from five different Memcached use cases over several days. We analyze the workloads from multiple angles, including: request composition, size, and rate; cache efficacy; temporal patterns; and application use cases. We also propose a simple model of the most representative trace to enable the generation of more realistic synthetic workloads by the community.
Our analysis details many characteristics of the caching workload. It also reveals a number of surprises: a GET/SET ratio of 30:1 that is higher than assumed in the literature; some applications of Memcached behave more like persistent storage than a cache; strong locality metrics, such as keys accessed many millions of times a day, do not always suffice for a high hit rate; and there is still room for efficiency and hit rate improvements in Memcached's implementation. Toward the last point, we make several suggestions that address the exposed deficiencies.

References

[1]
http://voldemort-project.com.
[2]
Ahmad, I. Easy and efficient disk I/O workload characterization in VMware ESX server. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2007).
[3]
Anand, A., Muthukrishnan, C., Kappes, S., Akella, A., and Nath, S. Cheap and large CAMs for high performance data-intensive networked systems. In Proceedings of the 7th USENIX conference on Networked Systems Design and Implementation (Apr. 2010).
[4]
Andersen, D. G., Franklin, J., Kaminsky, M., Phanishayee, A., Tan, L., and Vasudevan, V. FAWN: a fast array of wimpy nodes. In Proceedings of the 22nd ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Big Sky, Montana, 2009), ACM, pp. 1--14.
[5]
Arlitt, M., Friedrich, R., and Jin, T. Workload characterization of a web proxy in a cable modem environment. ACM SIGMETRICS - Performance Evaluation Review 27 (1999), 25--36.
[6]
Arlitt, M. F., and Williamson, C. L. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on Networking 5 (October 1997), 631--645.
[7]
Barford, P., Bestavros, A., Bradley, A., and Crovella, M. Changes in web client access patterns. In World Wide Web Journal, Special Issue on Characterization and Performance Evaluation (1999).
[8]
Beaver, D., Kumar, S., Li, H. C., Sobel, J., and Vajgel, P. Finding a needle in haystack: Facebook's photo storage. In Proceedings of the 10th USENIX Symposium on Operating Systems Design and Implementation (Oct. 2010).
[9]
Berezecki, M., Frachtenberg, E., Paleczny, M., and Steele, K. Many-core key-value store. In Proceedings of the Second International Green Computing Conference (Orlando, FL, Aug. 2011).
[10]
Breslau, L., Cao, P., Fan, L., Phillips, G., and Shenker, S. Web caching and zipf-like distributions: Evidence and implications. In Proceedings of the 18th Annual IEEE International Conference on Computer Communications (1999).
[11]
Carns, P., Latham, R., Ross, R., Kamil Iskra, S. L., and Riley, K. 24/7 characterization of petascale I/O workloads. In Proceedings of the 4th Workshop on Interfaces and Architectures for Scientific Data Storage (Nov. 2009).
[12]
Cunha, C. R., Bestavros, A., and Crovella, M. E. Characteristics of WWW client-based traces. In Technical Report TR-95-010, Boston University Department of Computer Science, (July 1995).
[13]
Debnath, B. K., Sengupta, S., and Li, J. Flashstore: High throughput persistent key-value store. Proceedings of 36th International Conference on Very Large Data Bases (VLDB) 3, 2 (2010).
[14]
Debnath, B. K., Sengupta, S., and Li, J. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the Annual ACM SIGMOD Conference (June 2010), pp. 25--36.
[15]
DeCandia, G., Hastorun, D., Jampani, M., Kakulapati, G., Pilchin, A., Sivasubramanian, S., Vosshall, P., and Vogels, W. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM SIGOPS Symposium on Operating Systems Principles (SOSP) (Stevenson, WA, 2007), pp. 205--220.
[16]
Duska, B. M., Marwood, D., and Feeley, M. J. The measured access characteristics of world-wide web client proxy caches. In Proceedings of USENIX Symposium of Internet Technologies and Systems (Dec. 1997).
[17]
Feitelson, D. G. Workload modeling for performance evaluation. In Performance Evaluation of Complex Systems: Techniques and Tools, M. C. Calzarossa and S. Tucci, Eds., vol. 2459 of Lecture Notes in Computer Science. Springer-Verlag, Sept. 2002, pp. 114--141. www.cs.huji.ac.il/ feit/papers/WorkloadModel02chap.ps.gz.
[18]
Fitzpatrick, B. Distributed caching with memcached. Linux Journal, 124 (Aug. 2004), 72--78. www.linuxjournal.com/article/7451?page=0,0.
[19]
Jiang, S., and Zhang, X. LIRS: an efficient low inter-reference recency set replacement policy to improve buffer cache performance. In Proceedings of the 2002 ACM SIGMETRICS international conference on Measurement and modeling of computer systems (2002), SIGMETRICS'02, ACM, pp. 31--42.
[20]
Kavalanekar, S., Worthington, B., Zhang, Q., and Sharda, V. Characterization of storage workload traces from production windows servers. In Proceedings of IEEE International Symposium on Workload Characterization (Sept. 2008).
[21]
Keeton, K., Alistair Veitch, D. O., and Wilkes, J. I/O characterization of commercial workloads. In Proceedings of the 3rd Workshop on Computer Architecture Evaluation using Commercial Workloads (Jan. 2000).
[22]
Kim, Y., Gunasekaran, R., Shipman, G. M., Dillow, D. A., Zhang, Z., and Settlemyer, B. W. Workload characterization of a leadership class storage cluster. In Proceedings of Petascale Data Storage Workshop (Nov. 2010).
[23]
Lim, H., Fan, B., Andersen, D. G., and Kaminsky, M. Silt: A memory-eficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles (Oct. 2011).
[24]
Lublin, U., and Feitelson, D. G. The workload on parallel supercomputers: Modeling the characteristics of rigid jobs. Journal of Parallel and Distributed Computing 63, 11 (Nov. 2003), 1105--1122. www.cs.huji.ac.il/ feit/papers/Rigid01TR.ps.gz.
[25]
Manley, S., and Seltzer, M. Web facts and fantasy. In Proceedings of USENIX Symposium on Internet Technologies and Systems (Dec. 1997).
[26]
Petrovic, J. Using Memcached for data distribution in industrial environment. In Proceedings of the Third International Conference on Systems (Washington, DC, 2008), IEEE Computer Society, pp. 368--372.
[27]
Reddi, V. J., Lee, B. C., Chilimbi, T., and Vaid, K. Web search using mobile cores: Quantifying and mitigating the price of efficiency. In Proceedings of the 37th International Symposium on Computer Architecture (ISCA) (June 2010), ACM. portal.acm.org/citation.cfm?id=1815961.1816002.
[28]
Roselli, D., Lorch, J. R., and Anderson, T. E. A comparison of file system workloads. In Proceedings of the 2000 USENIX Annual Technical Conference (June 2000).
[29]
Vasudevan, V. R. Energy-Efficient Data-intensive Computing with a Fast Array of Wimpy Nodes. PhD thesis, Carnegie Mellon University, Oct. 2011.
[30]
Wang, F., Xin, Q., Hong, B., Miller, E. L., Long, D. D. E., Brandt, S. A., and McLarty, T. T. File system workload analysis for large scientific computing applications. In Proceedings of 21st IEEE / 12th NASA Goddard Conference on Mass Storage Systems and Technologies (Apr. 2004).
[31]
Zhao, H. HipHop for PHP: Move fast. https://developers.facebook.com/blog/post/358/, Feb. 2010.

Cited By

View all
  • (2024)ELECTProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650715(293-310)Online publication date: 27-Feb-2024
  • (2024)Hybrid fault tolerance in distributed in-memory storage systemsJUSTC10.52396/JUSTC-2022-012554:4(0406)Online publication date: 2024
  • (2024)SSEPC cloud: Carbon footprint aware power efficient virtual machine placement in cloud milieuComputer Science and Information Systems10.2298/CSIS230923018P21:3(759-780)Online publication date: 2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMETRICS '12: Proceedings of the 12th ACM SIGMETRICS/PERFORMANCE joint international conference on Measurement and Modeling of Computer Systems
June 2012
450 pages
ISBN:9781450310970
DOI:10.1145/2254756
  • cover image ACM SIGMETRICS Performance Evaluation Review
    ACM SIGMETRICS Performance Evaluation Review  Volume 40, Issue 1
    Performance evaluation review
    June 2012
    433 pages
    ISSN:0163-5999
    DOI:10.1145/2318857
    Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. key-value store
  2. memcached
  3. workload analysis
  4. workload modeling

Qualifiers

  • Research-article

Conference

SIGMETRICS '12
Sponsor:

Acceptance Rates

Overall Acceptance Rate 459 of 2,691 submissions, 17%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)536
  • Downloads (Last 6 weeks)62
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ELECTProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650715(293-310)Online publication date: 27-Feb-2024
  • (2024)Hybrid fault tolerance in distributed in-memory storage systemsJUSTC10.52396/JUSTC-2022-012554:4(0406)Online publication date: 2024
  • (2024)SSEPC cloud: Carbon footprint aware power efficient virtual machine placement in cloud milieuComputer Science and Information Systems10.2298/CSIS230923018P21:3(759-780)Online publication date: 2024
  • (2024)Runtime Tests for Memory Error Handlers of In-Memory Key Value Stores Using MemFIIEICE Transactions on Information and Systems10.1587/transinf.2024EDP7019E107.D:11(1408-1421)Online publication date: 1-Nov-2024
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 1-Jan-2024
  • (2024)Can OS Specialization give new life to old carbon in the cloud?Proceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689158(83-90)Online publication date: 16-Sep-2024
  • (2024)Toward an Edge-Friendly Distributed Object Store for Serverless FunctionsProceedings of the 15th ACM SIGOPS Asia-Pacific Workshop on Systems10.1145/3678015.3680485(108-114)Online publication date: 4-Sep-2024
  • (2024)On the Feasibility and Benefits of Extensive EvaluationProceedings of the ACM on Management of Data10.1145/36771372:4(1-24)Online publication date: 30-Sep-2024
  • (2024)A Memory-Disaggregated Radix TreeACM Transactions on Storage10.1145/366428920:3(1-41)Online publication date: 6-Jun-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media