tutorial

Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage

Authors:

Mark Silberstein,

Lakshmi Ganesh,

Lorenzo Alvisi,

Mike DahlinAuthors Info & Claims

SYSTOR 2014: Proceedings of International Conference on Systems and Storage

Pages 1 - 7

https://doi.org/10.1145/2611354.2611370

Published: 30 June 2014 Publication History

Abstract

Erasure coding schemes provide higher durability at lower storage cost, and thus constitute an attractive alternative to replication in distributed storage systems, in particular for storing rarely accessed "cold" data. These schemes, however, require an order of magnitude higher recovery bandwidth for maintaining a constant level of durability in the face of node failures. In this paper we propose lazy recovery, a technique to reduce recovery bandwidth demands down to the level of replicated storage. The key insight is that a careful adjustment of recovery rate substantially reduces recovery bandwidth, while keeping the impact on read performance and data durability low. We demonstrate the benefits of lazy recovery via extensive simulation using a realistic distributed storage configuration and published component failure parameters. For example, when applied to the commonly used RS(14, 10) code, lazy recovery reduces repair bandwidth by up to 76% even below replication, while increasing the amount of degraded stripes by 0.1 percentage points. Lazy recovery works well with a variety of erasure coding schemes, including the recently introduced bandwidth efficient codes, achieving up to a factor of 2 additional bandwidth savings.

References

[1]

M. Abd-El-Malek, W. V. Courtright, C. Cranor, G. R. Ganger, J. Hendricks, A. J. Klosterman, M. Mesnier, M. Prasad, B. Salmon, R. R. Sambasivan, S. Sinnamohideen, J. D. Strunk, E. Thereska, M. Wachs, and J. J. Wylie. Ursa Minor: Versatile Cluster-based Storage. In FAST, 2005.

Digital Library

[2]

R. Bhagwan, K. Tati, Y.-C. Cheng, S. Savage, and G. M. Voelker. Total recall: System support for automated availability management. In NSDI, volume 4, pages 25--25, 2004.

Digital Library

[3]

B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. u. Haq, M. I. u. Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure Storage: a highly available cloud storage service with strong consistency. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 143--157, New York, NY, USA, 2011. ACM.

Digital Library

[4]

B.-G. Chun, F. Dabek, A. Haeberlen, E. Sit, H. Weatherspoon, M. F. Kaashoek, J. Kubiatowicz, and R. Morris. Efficient replica maintenance for distributed storage systems. In NSDI, 2006.

Digital Library

[5]

A. G. Dimakis. Technical talk. http://ita.ucsd.edu/workshop/12/files/abstract/abstract_764.txt.

[6]

A. G. Dimakis, P. B. Godfrey, Y. Wu, M. J. Wainwright, and K. Ramchandran. Network coding for distributed storage systems. IEEE Trans. Inf. Theor., 56(9):4539--4551, Sept. 2010.

Digital Library

[7]

J. Elerath and M. Pecht. A Highly Accurate Method for Assessing Reliability of Redundant Arrays of Inexpensive Disks (RAID). Computers, IEEE Transactions on, 58(3):289--299, march 2009.

Digital Library

[8]

Erasure Coding for Distributed Storage Wiki. http://csi.usc.edu/~dimakis/StorageWiki/doku.php.

[9]

D. Ford, F. Labelle, F. I. Popovici, M. Stokely, V.-A. Truong, L. Barroso, C. Grimes, and S. Quinlan. Availability in globally distributed storage systems. In OSDI, pages 61--74, 2010.

Digital Library

[10]

S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google file system. In Proceedings of the nineteenth ACM symposium on Operating systems principles, SOSP '03, pages 29--43, New York, NY, USA, 2003. ACM.

Digital Library

[11]

F. Giroire, J. Monteiro, and S. Perennes. Peer-to-peer storage systems: a practical guideline to be lazy. In GlobeCom, 2010.

[12]

K. Greenan. Reliability and power-efficiency in erasure-coded storage systems. PhD thesis, UCSC, 2009.

[13]

K. Greenan, E. L. Miller, and J. Wylie. Reliability of xor-based erasure codes on heterogeneous devices. In Proceedings of the 38th Annual IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2008), pages 147--156, June 2008.

[14]

K. M. Greenan, J. S. Plank, and J. J. Wylie. Mean time to meaningless: Mttdl, markov models, and storage system reliability. In Proceedings of the 2nd USENIX conference on Hot topics in storage and file systems, HotStorage'10, pages 5--5, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[15]

Hadoop Scalability at Facebook. http://download.yandex.ru/company/experience/yac/Molkov.pdf.

[16]

A. Haeberlen, A. Mislove, and P. Druschel. Glacier: Highly durable, decentralized storage despite massive correlated failures. In Proceedings of the USENIX Symposium on Networked Systems Design and Implementation (NSDI), 2005.

Digital Library

[17]

HDFS RAID. http://wiki.apache.org/hadoop/HDFS-RAID.

[18]

Y. Hu, H. C. H. Chen, P. P. C. Lee, and Y. Tang. Nccloud: Applying network coding for the storage repair in a cloud-of-clouds. In FAST, 2012.

Digital Library

[19]

Y. Hu, H. C. H. Chen, P. P. C. Lee, and Y. Tang. Rethinking Erasure Codes for Cloud File Systems: Minimizing I/O for Recovery and Degraded Reads. In FAST, 2012.

Digital Library

[20]

C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure Coding in Windows Azure Storage. In USENIX ATC, 2012.

Digital Library

[21]

R. T. Kaushik and M. Bhandarkar. Greenhdfs: towards an energy-conserving, storage-efficient, hybrid hadoop compute cluster. In Proceedings of the 2010 international conference on Power aware computing and systems, HotPower'10, pages 1--9, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[22]

R. T. Kaushik and M. Bhandarkar. GreenHDFS: Towards an Energy-conserving, Storage-efficient, Hybrid Hadoop Compute Cluster. In Proceedings of the 2010 International Conference on Power Aware Computing and Systems, HotPower'10, pages 1--9, Berkeley, CA, USA, 2010. USENIX Association.

Digital Library

[23]

J. Kubiatowicz, D. Bindel, Y. Chen, S. Czerwinski, P. Eaton, D. Geels, R. Gummadi, S. Rhea, H. Weatherspoon, W. Weimer, C. Wells, and B. Zhao. OceanStore: An Architecture for Global-Scale Persistent Storage. In Proceedings of the International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2000.

Digital Library

[24]

Large-Scale Distributed Systems at Google: Current Systems and Future Directions. http://www.cs.cornell.edu/projects/ladis2009/talks/dean-keynote-ladis2009.pdf.

[25]

A. W. Leung, S. Pasupathy, G. Goodson, and E. L. Miller. Measurement and analysis of large-scale network file system workloads. In USENIX 2008 Annual Technical Conference on Annual Technical Conference, ATC'08, pages 213--226, Berkeley, CA, USA, 2008. USENIX Association.

Digital Library

[26]

D. S. Papailiopoulos, J. Luo, A. G. Dimakis, C. Huang, and J. Li. Simple regenerating codes: Network coding for cloud storage. CoRR, abs/1109.0264, 2011.

[27]

K. Rashmi, N. B. Shah, D. Gu, H. Kuang, D. Borthakur, and K. Ramchandran. A Solution to the Network Challenges of Data Recovery in Erasure-coded Distributed Storage Systems: A Study on the Facebook Warehouse Cluster. In USENIX HotStorage 2013, 2013.

Digital Library

[28]

M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. Proc. VLDB Endow., 6(5):325--336, Mar. 2013.

Digital Library

[29]

K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In MSST, 2010.

Digital Library

[30]

I. Tamo, Z. Wang, and J. Bruck. MDS array codes with optimal rebuilding. In ISIT, pages 1240--1244, 2011.

[31]

The computer failure data repository. http://cfdr.usenix.org.

[32]

The Hadoop Distributed File System. http://www.aosabook.org/en/hdfs.html.

Cited By

Zhou HFeng D(2024)Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star NetworksACM Transactions on Architecture and Code Optimization10.1145/3664926Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664926
Xia JLuo LSun BCheng GGuo D(2024)Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2024.336799532:4(2888-2903)Online publication date: Aug-2024
https://doi.org/10.1109/TNET.2024.3367995
Guo HXu MZhang JLiu CRanjan RYu DCheng X(2024)BFT-DSN: A Byzantine Fault-Tolerant Decentralized Storage NetworkIEEE Transactions on Computers10.1109/TC.2024.336595373:5(1300-1312)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1109/TC.2024.3365953
Show More Cited By

Index Terms

Lazy Means Smart: Reducing Repair Bandwidth Costs in Erasure-coded Distributed Storage
1. Information systems
  1. Information retrieval
  2. Information storage systems

Recommendations

A "hitchhiker's" guide to fast and efficient data reconstruction in erasure-coded data centers
SIGCOMM '14: Proceedings of the 2014 ACM conference on SIGCOMM

Erasure codes such as Reed-Solomon (RS) codes are being extensively deployed in data centers since they offer significantly higher reliability than data replication methods at much lower storage overheads. These codes however mandate much higher ...
A Fast Repair Code Based on Regular Graphs for Distributed Storage Systems
IPDPSW '12: Proceedings of the 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum

Distributed storage systems (DSS) play an important role in data storage applications, since they provide high reliability for huge data storage requirement. As node failures are frequent in a large distributed storage system, the performance of ...
HRSPC

Replication and erasure codes are two popular schemes to provide fault tolerance in distributed storage systems. However, they both face some challenges when used in cloud storage. As the scale of data increases and failure becomes common in data ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SYSTOR 2014: Proceedings of International Conference on Systems and Storage

June 2014

168 pages

ISBN:9781450329200

DOI:10.1145/2611354

General Chair:
Eliezer Dekel,
Program Chairs:
Randal Burns,
Roy Friedman

Copyright © 2014 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGOPS: ACM Special Interest Group on Operating Systems

In-Cooperation

Technion: Israel Institute of Technology
USENIX Assoc: USENIX Assoc

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 June 2014

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Tutorial
Research
Refereed limited

Conference

SYSTOR 2014

Sponsor:

SIGOPS

SYSTOR 2014: International Conference on Systems and Storage

June 30 - July 2, 2014

Haifa, Israel

Acceptance Rates

Overall Acceptance Rate 94 of 285 submissions, 33%

Upcoming Conference

SYSTOR '24

Sponsor:
sigops

The 17th ACM International Systems and Storage Conference

September 23 - 24, 2024

Virtual , Israel

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

67
Total Citations
View Citations
404
Total Downloads

Downloads (Last 12 months)16
Downloads (Last 6 weeks)0

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Zhou HFeng D(2024)Stripe-schedule Aware Repair in Erasure-coded Clusters with Heterogeneous Star NetworksACM Transactions on Architecture and Code Optimization10.1145/3664926Online publication date: 13-May-2024
https://dl.acm.org/doi/10.1145/3664926
Xia JLuo LSun BCheng GGuo D(2024)Parallelized In-Network Aggregation for Failure Repair in Erasure-Coded Storage SystemsIEEE/ACM Transactions on Networking10.1109/TNET.2024.336799532:4(2888-2903)Online publication date: Aug-2024
https://doi.org/10.1109/TNET.2024.3367995
Guo HXu MZhang JLiu CRanjan RYu DCheng X(2024)BFT-DSN: A Byzantine Fault-Tolerant Decentralized Storage NetworkIEEE Transactions on Computers10.1109/TC.2024.336595373:5(1300-1312)Online publication date: 14-Feb-2024
https://dl.acm.org/doi/10.1109/TC.2024.3365953
Chen JLi ZSun QWang NSu L(2024)Boosting Correlated Failure Repair in SSD Data CentersIEEE Internet of Things Journal10.1109/JIOT.2023.333997911:8(14228-14240)Online publication date: 15-Apr-2024
https://doi.org/10.1109/JIOT.2023.3339979
Wei BHuang QChen HZhang CXiao L(2024)Erasure-Coded Hybrid Writes Based on Data DeltaInternational Journal of Parallel Programming10.1007/s10766-024-00773-052:4(231-252)Online publication date: 24-May-2024
https://doi.org/10.1007/s10766-024-00773-0
Shin DKim J(2023)Cache-Based Matrix Technology for Efficient Write and Recovery in Erasure Coding Distributed File SystemsSymmetry10.3390/sym1504087215:4(872)Online publication date: 6-Apr-2023
https://doi.org/10.3390/sym15040872
Iliadis I(2023)Reliability Evaluation of Erasure-coded Storage Systems with Latent ErrorsACM Transactions on Storage10.1145/356831319:1(1-47)Online publication date: 11-Jan-2023
https://dl.acm.org/doi/10.1145/3568313
Zhou HFeng D(2023)Boosting Erasure-Coded Multi-Stripe Repair in Rack Architecture and Heterogeneous Clusters: Design and AnalysisIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2023.328218034:8(2251-2264)Online publication date: Aug-2023
https://doi.org/10.1109/TPDS.2023.3282180
Wu YLiu DTan YDuan MLuo LWang WChen X(2023)LFPR: A Lazy Fast Predictive Repair Strategy for Mobile Distributed Erasure Coded ClusterIEEE Internet of Things Journal10.1109/JIOT.2022.320341510:1(704-719)Online publication date: 1-Jan-2023
https://doi.org/10.1109/JIOT.2022.3203415
Dinh TBoztas SDau SViterbo E(2023)Designing Compact Repair Groups for Reed-Solomon Codes2023 IEEE International Symposium on Information Theory (ISIT)10.1109/ISIT54713.2023.10206491(2027-2032)Online publication date: 25-Jun-2023
https://doi.org/10.1109/ISIT54713.2023.10206491
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents