Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2043556.2043562acmconferencesArticle/Chapter ViewAbstractPublication PagessospConference Proceedingsconference-collections
research-article

Design implications for enterprise storage systems via multi-dimensional trace analysis

Published: 23 October 2011 Publication History

Abstract

Enterprise storage systems are facing enormous challenges due to increasing growth and heterogeneity of the data stored. Designing future storage systems requires comprehensive insights that existing trace analysis methods are ill-equipped to supply. In this paper, we seek to provide such insights by using a new methodology that leverages an objective, multi-dimensional statistical technique to extract data access patterns from network storage system traces. We apply our method on two large-scale real-world production network storage system traces to obtain comprehensive access patterns and design insights at user, application, file, and directory levels. We derive simple, easily implementable, threshold-based design optimizations that enable efficient data placement and capacity optimization strategies for servers, consolidation policies for clients, and improved caching performance for both.

References

[1]
N. Agrawal, W. J. Bolosky, J. R. Douceur, and J. R. Lorch. A Five-Year Study of File-System Metadata. In FAST 2007.
[2]
E. Alpaydin. Introduction to Machine Learning. MIT Press, Cambridge, Massachusetts, 2004.
[3]
M. G. Baker, J. H. Hartman, M. D. Kupfer, K. W. Shirriff, and J. K. Ousterhout. Measurements of a distributed file system. In SOSP 1991.
[4]
P. Bodik, M. Goldszmidt, A. Fox, D. B. Woodard, and H. Andersen. Fingerprinting the datacenter: automated classification of performance crises. In EuroSys 2010.
[5]
Common Internet File System Technical Reference. Storage Network Industry Association, 2002.
[6]
IDC Whitepaper: The economics of Virtualization. www.vmware.com/files/pdf/Virtualization-application-based-cost-model-WP-EN.pdf.
[7]
J. R. Douceur and W. J. Bolosky. A Large-Scale Study of File-System Contents. In SIGMETRICS 1999.
[8]
D. Ellard, J. Ledlie, P. Malkani, and M. Seltzer. Passive NFS Tracing of Email and Research Workloads. In FAST 2003.
[9]
A. Ganapathi, H. Kuno, U. Dayal, J. L. Wiener, A. Fox, M. Jordan, and D. Patterson. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE 2009.
[10]
S. Gribble, G. S. Manku, E. Brewer, T. J. Gibson, and E. L. Miller. Self-Similarity in File Systems: Measurement and Applications. In SIGMETRICS 1998.
[11]
The gzip algorithm. http://www.gzip.org/algorithm.txt.
[12]
IDC Report: Worldwide File-Based Storage 2010--2014 Forecast Update. http://www.idc.com/getdoc.jsp?containerId=226267.
[13]
S. Kavalanekar, B. L. Worthington, Q. Zhang, and V. Sharda. Characterization of storage workload traces from production Windows Servers. In IISWC 2008.
[14]
Open Source Clustering Software - C Clustering Library. http://bonsai.hgc.jp/~mdehoon/software/cluster/software.htm, 2010.
[15]
A. Leung, S. Pasupathy, G. Goodson, and E. Miller. Measurement and analysis of large-scale network file system workloads. In USENIX ATC 2008.
[16]
D. T. Meyer and W. J. Bolosky. A Study of Practical Deduplication. In FAST 2010.
[17]
J. K. Ousterhout, H. D. Costa, D. Harrison, J. A. Kunze, M. Kupfer, and J. G. Thompson. A trace-driven analysis of the Unix 4.2 BSD file system. In SOSP 1985.
[18]
K. K. Ramakrishnan, P. Biswas, and R. Karedla. Analysis of file I/O traces in commercial computing environments. In SIGMETRICS 1992.
[19]
D. Roselli, J. Lorch, and T. Anderson. A comparison of file system workloads. In USENIX 2000.
[20]
I. Stoica. A Berkeley View of Big Data: Algorithms, Machines and People. UC Berkeley EECS Annual Research Symposium, 2011.
[21]
K. Thomas, C. Grier, J. Ma, V. Paxson, and D. Song Design and evaluation of a real-time URL spam filtering service. In IEEE Symposium on Security and Privacy 2011.
[22]
R. Villars. The Migration to Converged IT: What it Means for Infrastructure, Applications, and the IT Organization. IDC Directions Conference 2011.
[23]
VMware Whitepaper: Server Consolidation and Containment. www.vmware.com/pdf/server_consolidation.pdf.
[24]
W. Vogels. File system usage in Windows NT 4.0. In SOSP 1999.
[25]
M. Zhou and A. J. Smith. Analysis of Personal Computer Workloads. In MASCOTS 1999.

Cited By

View all
  • (2023)Distributed File Systems for Cloud Storage Design and Evolution2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI)10.1109/ICAEECI58247.2023.10370956(1-8)Online publication date: 19-Oct-2023
  • (2022)A principled approach for selecting block I/O tracesProceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3538643.3539754(52-58)Online publication date: 27-Jun-2022
  • (2022)Dissecting the Workload of Cloud Storage System2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00068(647-657)Online publication date: Jul-2022
  • Show More Cited By

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SOSP '11: Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles
October 2011
417 pages
ISBN:9781450309776
DOI:10.1145/2043556
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 23 October 2011

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SOSP '11
Sponsor:

Acceptance Rates

Overall Acceptance Rate 131 of 716 submissions, 18%

Upcoming Conference

SOSP '25
ACM SIGOPS 31st Symposium on Operating Systems Principles
October 13 - 16, 2025
Seoul , Republic of Korea

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)1
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Distributed File Systems for Cloud Storage Design and Evolution2023 First International Conference on Advances in Electrical, Electronics and Computational Intelligence (ICAEECI)10.1109/ICAEECI58247.2023.10370956(1-8)Online publication date: 19-Oct-2023
  • (2022)A principled approach for selecting block I/O tracesProceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems10.1145/3538643.3539754(52-58)Online publication date: 27-Jun-2022
  • (2022)Dissecting the Workload of Cloud Storage System2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS)10.1109/ICDCS54860.2022.00068(647-657)Online publication date: Jul-2022
  • (2021)Lightweight Dynamic Redundancy Control with Adaptive Encoding for Server-based StorageACM Transactions on Storage10.1145/345629217:4(1-38)Online publication date: 15-Oct-2021
  • (2021)DStoreProceedings of the 30th International Symposium on High-Performance Parallel and Distributed Computing10.1145/3431379.3460649(31-43)Online publication date: 21-Jun-2021
  • (2021)SSD-based Workload Characteristics and Their Performance ImplicationsACM Transactions on Storage10.1145/342313717:1(1-26)Online publication date: 8-Jan-2021
  • (2020)Characterizing Output Bottlenecks of a Production SupercomputerACM Transactions on Storage10.1145/333520515:4(1-39)Online publication date: 16-Jan-2020
  • (2020)I/O characteristic discovery for storage system optimizationsJournal of Parallel and Distributed Computing10.1016/j.jpdc.2020.08.005Online publication date: Sep-2020
  • (2019)Developing an Risk Signal Detection System Based on Opinion Mining for Financial Decision SupportSustainability10.3390/su1116425811:16(4258)Online publication date: 7-Aug-2019
  • (2019)Characterization of a Big Data Storage Workload in the CloudProceedings of the 2019 ACM/SPEC International Conference on Performance Engineering10.1145/3297663.3310302(33-44)Online publication date: 4-Apr-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media