research-article

Public Access

JEDI: model-driven trace generation for cache simulations

Authors:

Anirudh Sabnis,

Ramesh K. SitaramanAuthors Info & Claims

IMC '22: Proceedings of the 22nd ACM Internet Measurement Conference

Pages 679 - 693

https://doi.org/10.1145/3517745.3561466

Published: 25 October 2022 Publication History

Abstract

A major obstacle for caching research is the increasing difficulty of obtaining original traces from production caching systems. Original traces are voluminous and also may contain private and proprietary information, and hence not generally made available to the public. The lack of original traces hampers our ability to evaluate new cache designs and provides the rationale for JEDI, our new synthetic trace generation tool. JEDI generates a synthetic trace that is "similar" to the original trace collected from a production cache, in particular, the two traces have similar object-level properties and produce similar hit rates in a cache simulation. JEDI uses a novel traffic model called Popularity-Size Footprint Descriptor (pFD) that concisely captures key properties of the original trace and uses the pFD to generate the synthetic trace. We show that the synthetic traces produced by JEDI can be used to accurately simulate a wide range of cache admission and eviction algorithms and the hit rates obtained from these simulations correspond closely to those obtained from simulations that use the original traces. JEDI will be provided to the public as open-source, along with a library of pFD's computed from traffic classes hosted on Akamai's production CDN. This will allow researchers to produce realistic synthetic traces for their own caching research.

Supplementary Material

M4V File (567.m4v)

Presentation video

Download
57.03 MB

References

[1]

B tree wikipedia. https://en.wikipedia.org/wiki/B-tree.

[2]

Open zfs. https://en.wikipedia.org/wiki/OpenZFS.

[3]

Total variation distance. https://en.wikipedia.org/wiki/Total_variation_distance_of_probability_measures.

[4]

Zfs caching. https://en.wikipedia.org/wiki/ZFS.

[5]

S. Acharya, B. C. Smith, and P. Parnes. Characterizing user access to videos on the world wide web. In Multimedia Computing and Networking 2000, volume 3969, pages 130--141. International Society for Optics and Photonics, 1999.

[6]

D. Achlioptas, M. Chrobak, and J. Noga. Competitive analysis of randomized paging algorithms. In European Symposium on Algorithms, pages 419--430.

Digital Library

[7]

J. Alghazo, A. Akaaboune, and N. Botros. Sf-lru cache replacement algorithm. In Records of the 2004 International Workshop on Memory Technology, Design and Testing, 2004., pages 19--24. IEEE, 2004.

[8]

G. Almási, C. Caşcaval, and D. A. Padua. Calculating stack distances efficiently. In Proceedings of the 2002 workshop on Memory system performance, pages 37--43, 2002.

Digital Library

[9]

V. Almeida, A. Bestavros, M. Crovella, and A. De Oliveira. Characterizing reference locality in the www. In Fourth International Conference on Parallel and Distributed Information Systems, pages 92--103. IEEE, 1996.

Digital Library

[10]

M. Arjovsky, S. Chintala, and L. Bottou. Wasserstein generative adversarial networks. In International conference on machine learning, pages 214--223. PMLR, 2017.

Digital Library

[11]

M. F. Arlitt and C. L. Williamson. Internet web servers: Workload characterization and performance implications. IEEE/ACM Transactions on networking, 5(5):631--645, 1997.

Digital Library

[12]

P. Barford, A. Bestavros, A. Bradley, and M. Crovella. Changes in web client access patterns: Characteristics and caching implications. World Wide Web, 2(1):15--28, 1999.

Digital Library

[13]

P. Barford and M. Crovella. Generating representative web workloads for network and server performance evaluation. In Proceedings of the 1998 ACM SIGMETRICS joint international conference on Measurement and modeling of computer systems, pages 151--160, 1998.

Digital Library

[14]

D. S. Berger, R. Sitaraman, and M. Harchol-Balter. Adaptsize: Orchestrating the hot object memory cache in a cdn. In USENIX NSDI, pages 483--498, March 2017.

[15]

D. S. Berger, R. K. Sitaraman, and M. Harchol-Balter. Adaptsize: Orchestrating the hot object memory cache in a content delivery network. In 14th {USENIX} Symposium on Networked Systems Design and Implementation ({NSDI} 17), pages 483--498, 2017.

[16]

A. Blankstein, S. Sen, and M. J. Freedman. Hyperbolic caching: Flexible caching for web applications. In 2017 USENIX Annual Technical Conference (USENIX ATC 17), pages 499--511, 2017.

Digital Library

[17]

J. Boyar, M. R. Ehmsen, and K. S. Larsen. Theoretical evidence for the superiority of lru-2 over lru for the paging problem. In International Workshop on Approximation and Online Algorithms, pages 95--107. Springer, 2006.

[18]

M. Busari and C. Williamson. Prowgen: a synthetic workload generation tool for simulation evaluation of web proxy caches. Computer Networks, 38(6):779--794, 2002.

Digital Library

[19]

X. Cheng, C. Dale, and J. Liu. Understanding the characteristics of internet short video sharing: Youtube as a case study. arXiv preprint arXiv:0707.3670, 2007.

[20]

L. Cherkasova. Improving WWW proxies performance with greedy-dual-size-frequency caching policy. Hewlett-Packard Laboratories Palo Alto, CA, 1998.

[21]

A. Cidon, A. Eisenman, M. Alizadeh, and S. Katti. Cliffhanger: Scaling performance cliffs in web memory caches. In 13th USENIX Symposium on Networked Systems Design and Implementation (NSDI 16), pages 379--392, 2016.

Digital Library

[22]

T. S. P. E. Corporation. Specweb96 benchmark. https://www.spec.org/web96/.

[23]

A. Dan and D. Towsley. An approximate analysis of the lru and fifo buffer replacement schemes. In Proceedings of the 1990 ACM SIGMETRICS conference on Measurement and modeling of computer systems, pages 143--152, 1990.

Digital Library

[24]

J. Dilley, B. M. Maggs, J. Parikh, H. Prokop, R. K. Sitaraman, and W. E. Weihl. Globally distributed content delivery. IEEE Internet Computing, 6(5):50--58, 2002.

Digital Library

[25]

O. Eytan, D. Harnik, E. Ofer, R. Friedman, and R. Kat. It's time to revisit {LRU} vs.{FIFO}. In 12th {USENIX} Workshop on Hot Topics in Storage and File Systems (HotStorage 20), 2020.

[26]

B. Fitzpatrick. Distributed caching with memcached. Linux journal, 2004(124):5, 2004.

Digital Library

[27]

P. Gill, M. Arlitt, Z. Li, and A. Mahanti. Youtube traffic characterization: A view from the edge. In Proceedings of the 7th ACM SIGCOMM Conference on Internet Measurement, IMC '07, page 15--28, New York, NY, USA, 2007. Association for Computing Machinery.

Digital Library

[28]

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial networks. Communications of the ACM, 63(11):139--144, 2020.

Digital Library

[29]

D. Grund. Static Cache Analysis for Real-Time Systems: LRU, FIFO, PLRU. epubli, 2012.

[30]

C. Hogan and D. Epping. Essential Virtual SAN (VSAN): Administrator's Guide to VMware Virtual SAN. VMware Press, 2016.

[31]

Q. Huang, K. Birman, R. Van Renesse, W. Lloyd, S. Kumar, and H. C. Li. An analysis of facebook photo caching. In Proceedings of the Twenty-Fourth ACM Symposium on Operating Systems Principles, pages 167--181, 2013.

Digital Library

[32]

S. Jiang and X. Zhang. Lirs: An efficient low inter-reference recency set replacement policy to improve buffer cache performance. ACM SIGMETRICS Performance Evaluation Review, 30(1):31--42, 2002.

Digital Library

[33]

T. Johnson, D. Shasha, et al. 2q: a low overhead high performance bu er management replacement algorithm. In Proceedings of the 20th International Conference on Very Large Data Bases, pages 439--450. Citeseer, 1994.

[34]

K. Kant, V. Tewari, and R. K. Iyer. Geist: a generator for e-commerce & internet server traffic. In ISPASS, pages 49--56, 2001.

[35]

R. Karedla, J. S. Love, and B. G. Wherry. Caching strategies to improve disk system performance. Computer, 27(3):38--46, 1994.

Digital Library

[36]

K. V. Katsaros, G. Xylomenos, and G. C. Polyzos. Globetraff: a traffic workload generator for the performance evaluation of future internet architectures. In 2012 5th International Conference on New Technologies, Mobility and Security (NTMS), pages 1--5. IEEE, 2012.

[37]

W. King. Analysis of paging algorithms. In Proc. IFIP 1971 Congress, Ljubljana, pages 485--490. North-Holland, 1972.

[38]

D. Lee, J. Choi, J.-H. Kim, S. H. Noh, S. L. Min, Y. Cho, and C. S. Kim. Lrfu: A spectrum of policies that subsumes the least recently used and least frequently used policies. IEEE transactions on Computers, 50(12):1352--1361, 2001.

Digital Library

[39]

Q. Li, X. Liao, H. Jin, L. Lin, X. Xie, and Q. Yao. Cost-effective hybrid replacement strategy for ssd in web cache. In 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable, Autonomic and Secure Computing; Pervasive Intelligence and Computing, pages 1286--1294, 2015.

[40]

Z. Lin, A. Jain, C. Wang, G. Fanti, and V. Sekar. Generating high-fidelity, synthetic time series datasets with doppelganger. arXiv preprint arXiv:1909.13403, 2019.

[41]

B. M. Maggs and R. K. Sitaraman. Algorithmic nuggets in content delivery. ACM SIGCOMM Computer Communication Review, 45(3):52--66, 2015.

Digital Library

[42]

A. Mahanti, C. Williamson, and D. Eager. Web proxy workload characterization. Progress Report, Computer Sciences Dept, Univ. of Saskatchewan, 1999.

[43]

R. L. Mattson, J. Gecsei, D. R. Slutz, and I. L. Traiger. Evaluation techniques for storage hierarchies. IBM Systems journal, 9(2):78--117, 1970.

Digital Library

[44]

N. Megiddo and D. S. Modha. Arc: A self-tuning, low overhead replacement cache. In Fast, volume 3, pages 115--130, 2003.

Digital Library

[45]

N. Megiddo and D. S. Modha. ARC: A Self-Tuning, low overhead replacement cache. In 2nd USENIX Conference on File and Storage Technologies (FAST 03), San Francisco, CA, Mar. 2003. USENIX Association.

Digital Library

[46]

D. Mosberger and T. Jin. httperf---a tool for measuring web server performance. ACM SIGMETRICS Performance Evaluation Review, 26(3):31--37, 1998.

Digital Library

[47]

E. Nygren, R. K. Sitaraman, and J. Sun. The Akamai Network: A platform for high-performance Internet applications. ACM SIGOPS Operating Systems Review, 44(3):2--19, 2010.

Digital Library

[48]

D. reason and J. Reineke. Toward precise plru cache analysis. In 10th International Workshop on Worst-Case Execution Time Analysis (WCET 2010). Schloss Dagstuhl-Leibniz Center for Computer Science, 2010.

[49]

A. Rousskov and D. Wessels. High-performance benchmarking with web polygraph. Software: Practice and Experience, 34(2):187--211, 2004.

Digital Library

[50]

A. Sabnis and R. K. Sitaraman. Tragen: a synthetic trace generator for realistic cache simulations. In Proceedings of the 21st ACM Internet Measurement Conference, pages 366--379, 2021.

Digital Library

[51]

K. Saini. Squid Proxy Server 3.1: beginner's guide. Packt Publishing Ltd, 2011.

[52]

M. Z. Shafiq, A. R. Khakpour, and A. X. Liu. Characterizing caching workload of a large commercial content delivery network. In IEEE INFOCOM 2016-The 35th Annual IEEE International Conference on Computer Communications, pages 1--9. IEEE, 2016.

Digital Library

[53]

J. Summers, T. Brecht, D. Eager, and A. Gutarin. Characterizing the workload of a netflix streaming video server. In 2016 IEEE International Symposium on Workload Characterization (IISWC), pages 1--12. IEEE, 2016.

[54]

A. Sundarrajan, M. Feng, M. Kasbekar, and R. K. Sitaraman. Footprint descriptors: Theory and practice of cache provisioning in a global cdn. In Proceedings of the 13th International Conference on emerging Networking EXperiments and Technologies, pages 55--67, 2017.

Digital Library

[55]

A. Sundarrajan, M. Kasbekar, R. K. Sitaraman, and S. Shukla. Midgress-aware traffic provisioning for content delivery. In USENIX Annual Technical Conference (USENIX ATC 20), pages 543--557. USENIX Association, 2020.

[56]

W. Tang, Y. Fu, L. Cherkasova, and A. Vahdat. Medisyn: A synthetic streaming media service workload generator. In Proceedings of the 13th international workshop on Network and operating systems support for digital audio and video, pages 12--21, 2003.

Digital Library

[57]

M. Wajahat, A. Yele, T. Estro, A. Gandhi, and E. Zadok. Analyzing the distribution fit for storage workload and internet traffic traces. Performance Evaluation, 142:102121, 2020.

[58]

A. Williams, M. Arlitt, C. Williamson, and K. Barker. Web workload characterization: Ten years later. Web content delivery, pages 3--21, 2005.

[59]

S. Xu, M. Marwah, and N. Ramakrishnan. Stan: Synthetic network traffic generation using autoregressive neural models. arXiv preprint arXiv:2009.12740, 2020.

[60]

J. Yang, Y. Yue, and K. V. Rashmi. A large scale analysis of hundreds of in-memory cache clusters at twitter. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20), pages 191--208. USENIX Association, Nov. 2020.

Digital Library

[61]

Q. Yang, H. H. Zhang, and T. Li. Mining web logs for prediction models in www caching and prefetching. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining, pages 473--478, 2001.

Digital Library

[62]

Y. Yin, Z. Lin, M. Jin, G. Fanti, and V. Sekar. Practical gan-based synthetic ip header trace generation using netshare. In Proceedings of the ACM SIGCOMM 2022 Conference, pages 458--472, 2022.

Digital Library

[63]

J. Yiu. The Definitive Guide to ARM® Cortex®-M0 and Cortex-M0+ Processors. Academic Press, 2015.

Digital Library

Cited By

Ait-Oucheggou LRubini SBattou ABoukhobza J(2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107548
Dozier KSalamatian LRubenstein D(2024)Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)Proceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560058:2(1-34)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656005

Recommendations

JEDI: Adaptive Stochastic Estimation for Joint Enhancement and Despeckling of Images for SAR
CRV '09: Proceedings of the 2009 Canadian Conference on Computer and Robot Vision

Synthetic aperture radar (SAR) images are degraded by a form of multiplicative noise known as speckle. Current methods for despeckling are limited in that they either do not perform enough noise attenuation, or do not adequately preserve or enhance ...
JEDI: many-to-many end-to-end encryption and key delegation for IoT
SEC'19: Proceedings of the 28th USENIX Conference on Security Symposium

As the Internet of Things (IoT) emerges over the next decade, developing secure communication for IoT devices is of paramount importance. Achieving end-to-end encryption for large-scale IoT systems, like smart buildings or smart cities, is challenging ...
TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs

Translation Lookaside Buffers (TLBs) are critical to overall system performance. Much past research has addressed uniprocessor TLBs, lowering access times and miss rates. However, as Chip MultiProcessors (CMPs) become ubiquitous, TLB design and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

IMC '22: Proceedings of the 22nd ACM Internet Measurement Conference

October 2022

796 pages

ISBN:9781450392594

DOI:10.1145/3517745

General Chairs:
Chadi Barakat
Inria, Université Côte d'Azur
,
Cristel Pelsser
University of Strasbourg
,
Program Chairs:
Theophilus A. Benson
Brown University
,
David Choffnes
Northeastern University

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

In-Cooperation

USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 October 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

IMC '22

Sponsor:

IMC '22: ACM Internet Measurement Conference

October 25 - 27, 2022

Nice, France

Acceptance Rates

Overall Acceptance Rate 277 of 1,083 submissions, 26%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
450
Total Downloads

Downloads (Last 12 months)221
Downloads (Last 6 weeks)22

Reflects downloads up to 06 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Ait-Oucheggou LRubini SBattou ABoukhobza J(2025)QM-ARC: QoS-aware Multi-tier Adaptive Cache Replacement StrategyFuture Generation Computer Systems10.1016/j.future.2024.107548163(107548)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107548
Dozier KSalamatian LRubenstein D(2024)Analysis of False Negative Rates for Recycling Bloom Filters (Yes, They Happen!)Proceedings of the ACM on Measurement and Analysis of Computing Systems10.1145/36560058:2(1-34)Online publication date: 29-May-2024
https://dl.acm.org/doi/10.1145/3656005

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Figures

Tables

Media

View Table of Conten