Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3526064.3534110acmconferencesArticle/Chapter ViewAbstractPublication PageshpdcConference Proceedingsconference-collections
research-article
Open access

Access Trends of In-network Cache for Scientific Data

Published: 27 June 2022 Publication History

Abstract

Scientific collaborations are increasingly relying on large volumes of data for their work and many of them employ tiered systems to replicate the data to their worldwide user communities. Each user in the community often selects a different subset of data for their analysis tasks; however, members of a research group often are working on related research topics that require similar data objects. Thus, there is a significant amount of data sharing possible. In this work, we study the access traces of a federated storage cache known as the Southern California Petabyte Scale Cache. By studying the access patterns and potential for network traffic reduction by this caching system, we aim to explore the predictability of the cache uses and the potential for a more general in-network data caching. Our study shows that this distributed storage cache is able to reduce the network traffic volume by a factor of 2.35 during a part of the study period. We further show that machine learning models could predict cache utilization with an accuracy of 0.88. This demonstrates that such cache usage is predictable, which could be useful for managing complex networking resources such as in-network caching.

References

[1]
L Bauerdick, D Benjamin, K Bloom, B Bockelman, D Bradley, S Dasu, M Ernst, R Gardner, A Hanushevsky, H Ito, D Lesny, P McGuigan, S McKee, O Rind, H Severini, I Sfiligoi, M Tadel, I Vukotic, S Williams, F Würthwein, A Yagil, and W Yang. 2012. Using Xrootd to Federate Regional Storage. Journal of Physics: Conference Series 396, 4 (2012), 042009.
[2]
L. Bauerdick, K. Bloom, B. Bockelman, D. Bradley, S. Dasu, J. Dost, I. Sfiligoi, A. Tadel, M. Tadel, F.Wuerthwein, A. Yafil, and the CMS collaboration. 2014. XRootd, disk-based, caching proxy for optimization of data access, data placement and data replication. Journal of Physics: Conference Series 513, 4 (2014).
[3]
Ben Brown, Eli Dart, Gulshan Rai, Lauren Rotman, and Jason Zurawski. 2020. Nuclear Physics Network Requirements Review Report. University of California, Publication Management System Report LBNL-2001281. Energy Sciences Network. https://www.es.net/assets/Uploads/20200505-NP.pdf
[4]
E. Copps, H. Zhang, A. Sim, K. Wu, I. Monga, C. Guok, F. Wurthwein, D. Davila, and E. Fajardo. 2021. Analyzing scientific data sharing patterns with in-network data caching. In 4th ACM InternationalWorkshop on System and Network Telemetry and Analysis (SNTA 2021). ACM, ACM.
[5]
A. Dorigo, P. Elmer, F. Furano, and A. Hanushevsky. 2005. XROOTD - A highly scalable architecture for data access. WSEAS Transactions on Computers 4, 4 (2005), 348--353.
[6]
X. Espinal, S. Jezequel, M. Schulz, A. Sciabà, I. Vukotic, and F. Wuerthwein. 2020. The Quest to solve the HL-LHC data access puzzle. EPJ Web of Conferences 245 (2020), 04027. https://doi.org/10.1051/epjconf/202024504027
[7]
E. Fajardo, A. Tadel, M. Tadel, B. Steer, T. Martin, and F. Würthwein. 2018. A federated Xrootd cache. Journal of Physics: Conference Series 1085 (2018), 032025.
[8]
Edgar Fajardo, Derek Weitzel, Mats Rynge, Marian Zvada, John Hicks, Mat Selmeci, Brian Lin, Pascal Paschos, Brian Bockelman, Andrew Hanushevsky, Frank Würthwein, and Igor Sfiligoi. 2020. Creating a content delivery network for general science on the internet backbone using XCaches. EPJ Web of Conferences 245 (2020), 04041. https://doi.org/10.1051/epjconf/202024504041
[9]
Fajardo, Edgar, Tadel, Matevz, Balcas, Justas, Tadel, Alja, Würthwein, Frank, Davila, Diego, Guiang, Jonathan, and Sfiligoi, Igor. 2020. Moving the California distributed CMS XCache from bare metal into containers using Kubernetes. EPJ Web Conf. 245 (2020), 04042. https://doi.org/10.1051/epjconf/202024504042
[10]
Klaus Greff, Rupesh K Srivastava, Jan Koutník, Bas R Steunebrink, and Jürgen Schmidhuber. 2016. LSTM: A search space odyssey. IEEE transactions on neural networks and learning systems 28, 10 (2016), 2222--2232.
[11]
Anshuman Kalla and Sudhir Kumar Sharma. 2016. A constructive review of in-network caching: A core functionality of ICN. In 2016 International Conference on Computing, Communication and Automation (ICCCA). 567--574.
[12]
Yanhua Li, Haiyong Xie, Yonggang Wen, and Zhi-Li Zhang. 2013. Coordinating In-Network Caching in Content-Centric Networks: Model and Analysis. In 2013 IEEE 33rd International Conference on Distributed Computing Systems. 62--72. https://doi.org/10.1109/ICDCS.2013.71
[13]
Ruth Pordes, Don Petravick, Bill Kramer, Doug Olson, Miron Livny, Alain Roy, Paul Avery, Kent Blackburn, Torre Wenaus, Frank Würthwein, Ian Foster, Rob Gardner, Mike Wilde, Alan Blatecky, John McGee, and Rob Quick. 2007. The open science grid. Journal of Physics: Conference Series 78, 1 (2007), 012057.
[14]
Rizzi, Andrea, Petrucciani, Giovanni, and Peruzzi, Marco. 2019. A further reduction in CMS event data for analysis: the NANOAOD format. EPJ Web Conf. 214 (2019), 06021. https://doi.org/10.1051/epjconf/201921406021
[15]
C Saranya and G Manikandan. 2013. A study on normalization techniques for privacy preserving data mining. International Journal of Engineering and Technology (IJET) 5, 3 (2013), 2701--2704.
[16]
Alex Sherstinsky. 2020. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena 404 (2020), 132306.
[17]
Shumway, Robert H and Stoffer, David S. 2017. Time Series Analysis and Its Applications: With R Examples (4 ed.). Springer International Publishing AG. 166--172 pages.
[18]
Alex Sim, Ezra Kissel, and Chin Guok. 2022. Deploying in-network caches in support of distributed scientific data sharing. https://doi.org/10.48550/ARXIV. 2203.06843
[19]
Derek Weitzel, Marian Zvada, Ilija Vukotic, Rob Gardner, Brian Bockelman, Mats Rynge, Edgar Hernandez, Brian Lin, and Mátyás Selmeci. 2019. StashCache: A Distributed Caching Federation for the Open Science Grid. PEARC '19: Proceedings of the Practice and Experience in Advanced Research Computing on Rise

Cited By

View all
  • (2024)Experiences in deploying in-network data cachesEPJ Web of Conferences10.1051/epjconf/202429507018295(07018)Online publication date: 6-May-2024
  • (2024)Predicting Resource Utilization Trends with Southern California Petabyte Scale CacheEPJ Web of Conferences10.1051/epjconf/202429501044295(01044)Online publication date: 6-May-2024
  • (2023)Effectiveness and predictability of in-network storage cache for Scientific Workflows2023 International Conference on Computing, Networking and Communications (ICNC)10.1109/ICNC57223.2023.10074058(226-230)Online publication date: 20-Feb-2023

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SNTA '22: Fifth International Workshop on Systems and Network Telemetry and Analytics
June 2022
62 pages
ISBN:9781450393157
DOI:10.1145/3526064
This work is licensed under a Creative Commons Attribution International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. data pattern
  2. network cache
  3. prediction
  4. resource utilization
  5. xcache

Qualifiers

  • Research-article

Funding Sources

  • US Dept. of Energy, Office of Science, Office of Advanced Scientific Computing Research
  • U.S. National Science Foundation

Conference

HPDC '22

Acceptance Rates

Overall Acceptance Rate 22 of 106 submissions, 21%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)58
  • Downloads (Last 6 weeks)18
Reflects downloads up to 12 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Experiences in deploying in-network data cachesEPJ Web of Conferences10.1051/epjconf/202429507018295(07018)Online publication date: 6-May-2024
  • (2024)Predicting Resource Utilization Trends with Southern California Petabyte Scale CacheEPJ Web of Conferences10.1051/epjconf/202429501044295(01044)Online publication date: 6-May-2024
  • (2023)Effectiveness and predictability of in-network storage cache for Scientific Workflows2023 International Conference on Computing, Networking and Communications (ICNC)10.1109/ICNC57223.2023.10074058(226-230)Online publication date: 20-Feb-2023

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media