Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Agility and Performance in Elastic Distributed Storage

Published: 31 October 2014 Publication History

Abstract

Elastic storage systems can be expanded or contracted to meet current demand, allowing servers to be turned off or used for other tasks. However, the usefulness of an elastic distributed storage system is limited by its agility: how quickly it can increase or decrease its number of servers. Due to the large amount of data they must migrate during elastic resizing, state of the art designs usually have to make painful trade-offs among performance, elasticity, and agility.
This article describes the state of the art in elastic storage and a new system, called SpringFS, that can quickly change its number of active servers, while retaining elasticity and performance goals. SpringFS uses a novel technique, termed bounded write offloading, that restricts the set of servers where writes to overloaded servers are redirected. This technique, combined with the read offloading and passive migration policies used in SpringFS, minimizes the work needed before deactivation or activation of servers. Analysis of real-world traces from Hadoop deployments at Facebook and various Cloudera customers and experiments with the SpringFS prototype confirm SpringFS’s agility, show that it reduces the amount of data migrated for elastic resizing by up to two orders of magnitude, and show that it cuts the percentage of active servers required by 67--82%, outdoing state-of-the-art designs by 6--120%.

References

[1]
AMPLab. 2013. Algorithms, Machines, People Laboratory, Berkley. http://amplab.cs.berkeley.edu.
[2]
Hrishikesh Amur, James Cipar, Varun Gupta, Gregory R. Ganger, Michael A. Kozuch, and Karsten Schwan. 2010. Robust and flexible power-proportional storage. In Proceedings of the ACM Symposium on Cloud Computing. 217--228.
[3]
Peter Bodik, Michael Armbrust, Kevin Canini, Armando Fox, Michael Jordan, and David Patterson. 2008. A Case for Adaptive Datacenters to Conserve Energy and Improve reliability. University of California at Berkeley, Tech. Rep. UCB/EECS-2008-127.
[4]
Dhruba Borthakur. 2007. The Hadoop Distributed File System: Architecture and Design. The Apache Software Foundation.
[5]
Randal E. Bryant. 2007. Data-intensive supercomputing: The case for DISC. Tech. rep., Carnegie Mellon University.
[6]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. BigTable: A distributed storage system for structured data. ACM Trans. Comput. Syst. 26, 2, 1--26.
[7]
Yanpei Chen, Sara Alspaugh, and Randy Katz. 2012. Interactive analytical processing in big data systems: A cross industry study of MapReduce workloads. Proc. VLDB Endow. 5, 12, 1802--1813.
[8]
Yanpei Chen, Archana Ganapathi, Rean Griffith, and Randy Katz. 2011. The case for evaluating MapReduce performance using workload suites. In Proceedings of the IEEE 9th International Symposium on Modeling, Analysis & Simulation of Computer and Telecommunication Systems (MASCOTS).
[9]
Jeffrey Dean and Sanjay Ghemawat. 2008. MapReduce: Simplified data processing on large clusters. Commun. ACM 51, 1, 107--108.
[10]
Sanjay Ghemawat, Howard Gobioff, and Shun tak Leung. 2003. The Google file system. In Proceedings of the 19th ACM Symposium on Operating System Principles (SOSP). 29--43.
[11]
Daniel Gmach, Jerry Rolia, Ludmila Cherkasova, and Alfons Kemper. 2007. Workload analysis and demand prediction of enterprise data center applications. In Proceedings of the IEEE 10th International Symposium or Workload Characterization (IISWC).
[12]
Hadoop. 2012. The Apache Hadoop project. http://hadoop.apache.org.
[13]
Larry Hardesty. 2012. MIT, Intel unveil new initiatives addressing ’Big Data’. http://web.mit.edu/newsoffice/2012/big-data-csail-intel-center-0531.html.
[14]
Michael Isard, Vijayan Prabhakaran, Jon Currey, Udi Wieder, Kunal Talwar, and Andrew Goldberg. 2009. Quincy: Fair scheduling for distributed computing clusters. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles (SOSP’09). ACM, New York, NY, 261--276.
[15]
ISTC-CC. 2013. Intel science and technology center - cloud computing. www.istc-cc.cmu.edu.
[16]
Jacob Leverich and Christos Kozyrakis. 2009. On the energy (in)efficiency of Hadoop clusters. In Proceedings of the Workshop on Power-Aware Computing and System HotPower.
[17]
Minghong Lin, Adam Wierman, Lachlan L. H. Andrew, and Eno Thereska. 2011. Dynamic right-sizing for power-proportional data centers. In Proceedings of the INFOCOM.
[18]
Dushyanth Narayanan, Austin Donnelly, and Antony Rowstron. 2008a. Write off-loading: Practical power management for enterprise storage. In Proceedings of the USENIX Conference on File and Storage Technologies. USENIX Association, Berkeley, CA, 1--15.
[19]
Dushyanth Narayanan, Austin Donnelly, Eno Thereska, Sameh Elnikety, and Antony Rowstron. 2008b. Everest: Scaling down peak loads through I/O off-loading. In Proceedings of the 8th USENIX Symposium on Operating Systems and Implementation (OSD).
[20]
Yasushi Saito, Svend Frølund, Alistair Veitch, Arif Merchant, and Susan Spence. 2004. FAB: Building distributed enterprise disk arrays from commodity components. In Proceedings of the 11th International Conference on Architechtural Support for Programming Languages and Operating System. 48--58.
[21]
Eno Thereska, Austin Donnelly, and Dushyanth Narayanan. 2011. Sierra: Practical power-proportionality for data center storage. In Proceedings of the 6th Conference on Computer Systems (EuroSys). 169--182.
[22]
Nedeljko Vasić, Martin Barisits, Vincent Salzgeber, and Dejan Kostic. 2009. Making cluster applications energy-aware. In Proceedings of the Workshop on Automated Control for Datacenters and Clouds. 37--42.
[23]
Charles Weddle, Mathew Oldham, Jin Qian, An-I Andy Wang, Peter L. Reiher, and Geoffrey H. Kuenning. 2007. PARAID: A gear-shifting power-aware RAID. ACM Trans. Storage 3, 3, Article 13.
[24]
E. R. Zayas. 1991. AFS-3 programmer’s reference: Architectural overview. Tech. Rep. Transarc Corporation.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 10, Issue 4
Special Issue on Usenix Fast 2014
October 2014
102 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/2685385
  • Editor:
  • Darrell Long
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 October 2014
Accepted: 01 September 2014
Received: 01 August 2014
Published in TOS Volume 10, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Cloud storage
  2. agility
  3. distributed file systems
  4. elastic storage
  5. power
  6. write offloading

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)7
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2022)BibliographyStorage Systems10.1016/B978-0-32-390796-5.00023-1(641-693)Online publication date: 2022
  • (2022)Database parallelism, big data and analytics, deep learningStorage Systems10.1016/B978-0-32-390796-5.00017-6(385-491)Online publication date: 2022
  • (2020)How fast can one resize a distributed file system?Journal of Parallel and Distributed Computing10.1016/j.jpdc.2020.02.001Online publication date: Mar-2020
  • (2019)Combining malleability and I/O control mechanisms to enhance the execution of multiple applicationsJournal of Systems and Software10.1016/j.jss.2018.11.006148(21-36)Online publication date: Feb-2019
  • (2018)A performance modeling framework for lambda architecture based applicationsFuture Generation Computer Systems10.1016/j.future.2017.07.03386:C(1032-1041)Online publication date: 1-Sep-2018
  • (2018)Risk Assessment and Monitoring in Intelligent Data-Centric SystemsSecurity and Resilience in Intelligent Data-Centric Systems and Communication Networks10.1016/B978-0-12-811373-8.00002-1(29-52)Online publication date: 2018
  • (2016)Enabling Space Elasticity in Storage SystemsProceedings of the 9th ACM International on Systems and Storage Conference10.1145/2928275.2928291(1-11)Online publication date: 6-Jun-2016
  • (2016)Improving reliability and performances in large scale distributed applications with erasure codes and replicationFuture Generation Computer Systems10.1016/j.future.2015.07.00656:C(773-782)Online publication date: 1-Mar-2016
  • (2016)Performance Modeling of Big Data-Oriented ArchitecturesResource Management for Big Data Platforms10.1007/978-3-319-44881-7_1(3-34)Online publication date: 28-Oct-2016
  • (2016)Modeling Replication and Erasure Coding in Large Scale Distributed Storage Systems Based on CEPHDigitally Supported Innovation10.1007/978-3-319-40265-9_20(273-284)Online publication date: 28-Jul-2016
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media