Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3517745.3561466acmconferencesArticle/Chapter ViewAbstractPublication PagesimcConference Proceedingsconference-collections
research-article
Public Access

JEDI: model-driven trace generation for cache simulations

Published: 25 October 2022 Publication History

Abstract

A major obstacle for caching research is the increasing difficulty of obtaining original traces from production caching systems. Original traces are voluminous and also may contain private and proprietary information, and hence not generally made available to the public. The lack of original traces hampers our ability to evaluate new cache designs and provides the rationale for JEDI, our new synthetic trace generation tool. JEDI generates a synthetic trace that is "similar" to the original trace collected from a production cache, in particular, the two traces have similar object-level properties and produce similar hit rates in a cache simulation. JEDI uses a novel traffic model called Popularity-Size Footprint Descriptor (pFD) that concisely captures key properties of the original trace and uses the pFD to generate the synthetic trace. We show that the synthetic traces produced by JEDI can be used to accurately simulate a wide range of cache admission and eviction algorithms and the hit rates obtained from these simulations correspond closely to those obtained from simulations that use the original traces. JEDI will be provided to the public as open-source, along with a library of pFD's computed from traffic classes hosted on Akamai's production CDN. This will allow researchers to produce realistic synthetic traces for their own caching research.

Supplementary Material

M4V File (567.m4v)
Presentation video

References

[1]
B tree wikipedia. https://en.wikipedia.org/wiki/B-tree.
[2]
Open zfs. https://en.wikipedia.org/wiki/OpenZFS.
[3]
Total variation distance. https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures.
[4]
Zfs caching. https://en.wikipedia.org/wiki/ZFS.
[5]
S. Acharya, B. C. Smith, and P. Parnes. Characterizing user access to videos on the world wide web. In Multimedia Computing and Networking 2000, volume 3969, pages 130--141. International Society for Optics and Photonics, 1999.
[6]
D. Achlioptas, M. Chrobak, and J. Noga. Competitive analysis of randomized paging algorithms. In European Symposium on Algorithms, pages 419--430.
[7]
J. Alghazo, A. Akaaboune, and N. Botros. Sf-lru cache replacement algorithm. In Records of the 2004 International Workshop on Memory Technology, Design and Testing, 2004., pages 19--24. IEEE, 2004.
[8]
G. Almási, C. Caşcaval, and D. A. Padua. Calculating stack distances efficiently. In Proceedings of the 2002 workshop on Memory system performance, pages 37--43, 2002.
[9]
V. Almeida, A. Bestavros, M. Crovella, and A. De Oliveira. Characterizing reference locality in the www. In Fourth International Conference on Parallel and Distributed Information Systems, pages 92--103. IEEE, 1996.
[10]
M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214--223. PMLR, 2017.
[11]
M. F. Arlitt and C. L. Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on networking, 5(5):631--645, 1997.
[12]
P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in web client access patterns: Characteristics and caching implications. World Wide Web, 2(1):15--28, 1999.
[13]
P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, pages 151--160, 1998.
[14]
D. S. Berger, R. Sitaraman, and M. Harchol-Balter. Adaptsize: Orchestrating the hot object memory cache in a cdn. In USENIX NSDI, pages 483--498, March 2017.
[15]
D. S. Berger, R. K. Sitaraman, and M. Harchol-Balter. Adaptsize: Orchestrating the hot object memory cache in a content delivery network. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 483--498, 2017.
[16]
A. Blankstein, S. Sen, and M. J. Freedman. Hyperbolic caching: Flexible caching for web applications. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 499--511, 2017.
[17]
J. Boyar, M. R. Ehmsen, and K. S. Larsen. Theoretical evidence for the superiority of lru-2 over lru for the paging problem. In International Workshop on Approximation and Online Algorithms, pages 95--107. Springer, 2006.
[18]
M. Busari and C. Williamson. Prowgen: a synthetic workload generation tool for simulation evaluation of web proxy caches. Computer Networks, 38(6):779--794, 2002.
[19]
X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study. arXiv preprint arXiv:0707.3670, 2007.
[20]
L. Cherkasova. Improving WWW proxies performance with greedy-dual-size-frequency caching policy. Hewlett-Packard Laboratories Palo Alto, CA, 1998.
[21]
A. Cidon, A. Eisenman, M. Alizadeh, and S. Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379--392, 2016.
[22]
T. S. P. E. Corporation. Specweb96 benchmark. https://www.spec.org/web96/.
[23]
A. Dan and D. Towsley. An approximate analysis of the lru and fifo buffer replacement schemes. In Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 143--152, 1990.
[24]
J. Dilley, B. M. Maggs, J. Parikh, H. Prokop, R. K. Sitaraman, and W. E. Weihl. Globally distributed content delivery. IEEE Internet Computing, 6(5):50--58, 2002.
[25]
O. Eytan, D. Harnik, E. Ofer, R. Friedman, and R. Kat. It's time to revisit {LRU} vs.{FIFO}. In 12th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 20), 2020.
[26]
B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.
[27]
P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, page 15--28, New York, NY, USA, 2007. Association for Computing Machinery.
[28]
I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139--144, 2020.
[29]
D. Grund. Static Cache Analysis for Real-Time Systems: LRU, FIFO, PLRU. epubli, 2012.
[30]
C. Hogan and D. Epping. Essential Virtual SAN (VSAN): Administrator's Guide to VMware Virtual SAN. VMware Press, 2016.
[31]
Q. Huang, K. Birman, R. Van Renesse, W. Lloyd, S. Kumar, and H. C. Li. An analysis of facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 167--181, 2013.
[32]
S. Jiang and X. Zhang. Lirs: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1):31--42, 2002.
[33]
T. Johnson, D. Shasha, et al. 2q: a low overhead high performance bu er management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 439--450. Citeseer, 1994.
[34]
K. Kant, V. Tewari, and R. K. Iyer. Geist: a generator for e-commerce & internet server traffic. In ISPASS, pages 49--56, 2001.
[35]
R. Karedla, J. S. Love, and B. G. Wherry. Caching strategies to improve disk system performance. Computer, 27(3):38--46, 1994.
[36]
K. V. Katsaros, G. Xylomenos, and G. C. Polyzos. Globetraff: a traffic workload generator for the performance evaluation of future internet architectures. In 2012 5th International Conference on New Technologies, Mobility and Security (NTMS), pages 1--5. IEEE, 2012.
[37]
W. King. Analysis of paging algorithms. In Proc. IFIP 1971 Congress, Ljubljana, pages 485--490. North-Holland, 1972.
[38]
D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. Lrfu: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE transactions on Computers, 50(12):1352--1361, 2001.
[39]
Q. Li, X. Liao, H. Jin, L. Lin, X. Xie, and Q. Yao. Cost-effective hybrid replacement strategy for ssd in web cache. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pages 1286--1294, 2015.
[40]
Z. Lin, A. Jain, C. Wang, G. Fanti, and V. Sekar. Generating high-fidelity, synthetic time series datasets with doppelganger. arXiv preprint arXiv:1909.13403, 2019.
[41]
B. M. Maggs and R. K. Sitaraman. Algorithmic nuggets in content delivery. ACM SIGCOMM Computer Communication Review, 45(3):52--66, 2015.
[42]
A. Mahanti, C. Williamson, and D. Eager. Web proxy workload characterization. Progress Report, Computer Sciences Dept, Univ. of Saskatchewan, 1999.
[43]
R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems journal, 9(2):78--117, 1970.
[44]
N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Fast, volume 3, pages 115--130, 2003.
[45]
N. Megiddo and D. S. Modha. ARC: A Self-Tuning, low overhead replacement cache. In 2nd USENIX Conference on File and Storage Technologies (FAST 03), San Francisco, CA, Mar. 2003. USENIX Association.
[46]
D. Mosberger and T. Jin. httperf---a tool for measuring web server performance. ACM SIGMETRICS Performance Evaluation Review, 26(3):31--37, 1998.
[47]
E. Nygren, R. K. Sitaraman, and J. Sun. The Akamai Network: A platform for high-performance Internet applications. ACM SIGOPS Operating Systems Review, 44(3):2--19, 2010.
[48]
D. reason and J. Reineke. Toward precise plru cache analysis. In 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2010). Schloss Dagstuhl-Leibniz Center for Computer Science, 2010.
[49]
A. Rousskov and D. Wessels. High-performance benchmarking with web polygraph. Software: Practice and Experience, 34(2):187--211, 2004.
[50]
A. Sabnis and R. K. Sitaraman. Tragen: a synthetic trace generator for realistic cache simulations. In Proceedings of the 21st ACM Internet Measurement Conference, pages 366--379, 2021.
[51]
K. Saini. Squid Proxy Server 3.1: beginner's guide. Packt Publishing Ltd, 2011.
[52]
M. Z. Shafiq, A. R. Khakpour, and A. X. Liu. Characterizing caching workload of a large commercial content delivery network. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pages 1--9. IEEE, 2016.
[53]
J. Summers, T. Brecht, D. Eager, and A. Gutarin. Characterizing the workload of a netflix streaming video server. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1--12. IEEE, 2016.
[54]
A. Sundarrajan, M. Feng, M. Kasbekar, and R. K. Sitaraman. Footprint descriptors: Theory and practice of cache provisioning in a global cdn. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies, pages 55--67, 2017.
[55]
A. Sundarrajan, M. Kasbekar, R. K. Sitaraman, and S. Shukla. Midgress-aware traffic provisioning for content delivery. In USENIX Annual Technical Conference (USENIX ATC 20), pages 543--557. USENIX Association, 2020.
[56]
W. Tang, Y. Fu, L. Cherkasova, and A. Vahdat. Medisyn: A synthetic streaming media service workload generator. In Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, pages 12--21, 2003.
[57]
M. Wajahat, A. Yele, T. Estro, A. Gandhi, and E. Zadok. Analyzing the distribution fit for storage workload and internet traffic traces. Performance Evaluation, 142:102121, 2020.
[58]
A. Williams, M. Arlitt, C. Williamson, and K. Barker. Web workload characterization: Ten years later. Web content delivery, pages 3--21, 2005.
[59]
S. Xu, M. Marwah, and N. Ramakrishnan. Stan: Synthetic network traffic generation using autoregressive neural models. arXiv preprint arXiv:2009.12740, 2020.
[60]
J. Yang, Y. Yue, and K. V. Rashmi. A large scale analysis of hundreds of in-memory cache clusters at twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 191--208. USENIX Association, Nov. 2020.
[61]
Q. Yang, H. H. Zhang, and T. Li. Mining web logs for prediction models in www caching and prefetching. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 473--478, 2001.
[62]
Y. Yin, Z. Lin, M. Jin, G. Fanti, and V. Sekar. Practical gan-based synthetic ip header trace generation using netshare. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 458--472, 2022.
[63]
J. Yiu. The Definitive Guide to ARM® Cortex®-M0 and Cortex-M0+ Processors. Academic Press, 2015.

Cited By

View all
  • (2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
  • (2024)Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)Proceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560058:2(1-34)Online publication date: 29-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
IMC '22: Proceedings of the 22nd ACM Internet Measurement Conference
October 2022
796 pages
ISBN:9781450392594
DOI:10.1145/3517745
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

  • USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2022

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Funding Sources

Conference

IMC '22
IMC '22: ACM Internet Measurement Conference
October 25 - 27, 2022
Nice, France

Acceptance Rates

Overall Acceptance Rate 277 of 1,083 submissions, 26%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)221
  • Downloads (Last 6 weeks)22
Reflects downloads up to 06 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
  • (2024)Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)Proceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560058:2(1-34)Online publication date: 29-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media