Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2742854.2742869acmconferencesArticle/Chapter ViewAbstractPublication PagescfConference Proceedingsconference-collections
research-article

Cooperative repair based on tree structure for multiple failures in distributed storage systems with regenerating codes

Published: 06 May 2015 Publication History

Abstract

Regenerating codes have been proposed to achieve an optimal trade-off curve between the amount of storage space and the network traffic for repair. However, existing repair schemes based on regenerating codes are inadequate to meet the requirements of small network traffic cost and high efficiency when repairing multiple failures. In this paper, we propose a cooperative repair scheme based on tree structure for multiple failures with regenerating codes, called CTREE. For generality, we propose a two-layer repair framework to support both repairs for single and multiple failures. For high repair efficiency, a parallel tree-structured data transmission technique is proposed to organize the data transmissions between the providers and newcomers. For small network network traffic cost, a core-based data exchange technique is proposed to organize the data exchanges between the coordinator and the other newcomers. To evaluate the performance of CTREE, we conduct experiments on both 30 physical and 200 virtual servers. Numerical analysis and extensive experiments confirm that CTREE can support both single and multiple failure repairs, significantly reduces the network traffic cost and improves the repair efficiency compared with the state-of-the-art approaches under various parameter settings.

References

[1]
Zeroc. {online}. available:, http://www.zeroc.com/.
[2]
R. Bhagwan, K. Tati, Y. Cheng, S. Savage, and G. Voelker. Total recall: System support for automated availability management. In Proceedings of the 1st conference on Symposium on Networked Systems Design and Implementation, volume 1, pages 25--25, 2004.
[3]
V. R. Cadambe, C. Huang, and J. Li. Permutation code: Optimal exact-repair of a single failed node in MDS code based distributed storage systems. In IEEE International Symposium on Information Theory Proceedings, ISIT 2011, St. Petersburg, Russia, July 31 - August 5, 2011, pages 1225--1229, 2011.
[4]
O. Dalle, F. Giroire, J. Monteiro, and S. Pérennes. Analysis of failure correlation impact on peer-to-peer storage systems. In Peer-to-Peer Computing, 2009. P2P'09. IEEE Ninth International Conference on, pages 184--193. IEEE, 2009.
[5]
A. Dimakis and P. Godfrey. Network coding for distributed storage systems. Information Theory, IEEE Transactions on, 56(9): 4539--4551, 2010.
[6]
A. Duminuco and E. Biersack. Hierarchical codes: How to make erasure codes attractive for peer-to-peer storage systems. In Peer-to-Peer Computing, 2008. P2P'08. Eighth International Conference on, pages 89--98. IEEE, 2008.
[7]
A. Duminuco and E. Biersack. A practical study of regenerating codes for peer-to-peer backup systems. In Distributed Computing Systems, 2009. ICDCS'09. 29th IEEE International Conference on, pages 376--384. IEEE, 2009.
[8]
Y. Hu, Y. Xu, X. Wang, C. Zhan, and P. Li. Cooperative recovery of distributed storage systems from multiple losses with network coding. Selected Areas in Communications, IEEE Journal on, 28(2): 268--276, 2010.
[9]
C. Huang, M. Chen, and J. Li. Pyramid codes: Flexible schemes to trade space for access efficiency in reliable data storage systems. ACM Transactions on Storage (TOS), 9(1): 3, 2013.
[10]
C. Huang, H. Simitci, Y. Xu, A. Ogus, B. Calder, P. Gopalan, J. Li, and S. Yekhanin. Erasure coding in windows azure storage. In USENIX Annual Technical Conference (USENIX ATC), 2012.
[11]
Z. Huang, E. Biersack, and Y. Peng. Reducing repair traffic in p2p backup systems: Exact regenerating codes on hierarchical codes. ACM Transactions on Storage (TOS), 7(3): 10, 2011.
[12]
J. Li, S. Yang, X. Wang, and B. Li. Tree-structured data regeneration in distributed storage systems with regenerating codes. In INFOCOM, 2010 Proceedings IEEE, pages 1--9. IEEE, 2010.
[13]
R. Li, J. Lin, and P. P. C. Lee. Core: Augmenting regenerating-coding-based recovery for single and concurrent failures in distributed storage systems. In IEEE 29th Symposium on Mass Storage Systems and Technologies, MSST 2013, May 6--10, 2013, Long Beach, CA, USA, pages 1--6, 2013.
[14]
X. Lu, H. Wang, J. Wang, J. Xu, and D. Li. Internet-based virtual computing environment: Beyond the data center as a computer. Future Generation Computer Systems, 29: 309--322, 2011.
[15]
L. OpenStack. Openstack: The open source cloud operating system, 2012.
[16]
R. Rodrigues and B. Liskov. High availability in dhts: Erasure coding vs. replication. Peer-to-Peer Systems IV, pages 226--239, 2005.
[17]
M. Sathiamoorthy, M. Asteris, D. Papailiopoulos, A. G. Dimakis, R. Vadali, S. Chen, and D. Borthakur. Xoring elephants: Novel erasure codes for big data. In Proceedings of the 39th international conference on Very Large Data Bases, pages 325--336. VLDB Endowment, 2013.
[18]
W. Sun, Y. Wang, Y. Fu, and X. Pei. A discrete data dividing approach for erasure-code-based storage applications. In Service Oriented System Engineering (SOSE), 2014 IEEE 8th International Symposium on, pages 308--313. IEEE, 2014.
[19]
Y. Wang and S. Li. Research and performance evaluation of data replication technology in distributed storage systems. International Journal of Computers and Mathematics with Applications, 51(11): 1625--1632, 2006.
[20]
Y. Wang, X. Li, X. Li, and Y. Wang. A survey of queries over uncertain data. Knowledge and information systems, 37(3): 485--530, 2013.
[21]
Y. Wang and X. Ma. A general scalable and elastic content-based publish/subscribe service. IEEE Transactions on Parallel and Distributed Systems (TPDS), 2014.
[22]
S. Weidong, W. Yijie, and P. Xiaoqiang. Tree-structured parallel regeneration for multiple data losses in distributed storage systems based on erasure codes. Communications, China, 10(4): 113--125, 2013.

Cited By

View all
  • (2024)Adaptive Recovery with Reinforcement Learning in Cloud-of-Clouds Storage SystemsFrontiers of Networking Technologies10.1007/978-981-97-3890-8_4(48-60)Online publication date: 10-Jul-2024
  • (2017)Cloud storage reliability for Big Data applicationsJournal of Network and Computer Applications10.1016/j.jnca.2017.08.01197:C(35-47)Online publication date: 1-Nov-2017

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
CF '15: Proceedings of the 12th ACM International Conference on Computing Frontiers
May 2015
413 pages
ISBN:9781450333580
DOI:10.1145/2742854
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2015

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. multiple failures
  2. network traffic cost
  3. regenerating codes
  4. repair efficiency

Qualifiers

  • Research-article

Funding Sources

  • Natural Science Foundation for Distinguished Young Scholars of Hunan Province
  • Specialized Research Fund for the Doctoral Program of Higher Education
  • National Natural Science Foundation of China
  • National High Technology Research and Development 863 Program of China

Conference

CF'15
Sponsor:
CF'15: Computing Frontiers Conference
May 18 - 21, 2015
Ischia, Italy

Acceptance Rates

CF '15 Paper Acceptance Rate 33 of 96 submissions, 34%;
Overall Acceptance Rate 273 of 785 submissions, 35%

Upcoming Conference

CF '25

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)2
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Recovery with Reinforcement Learning in Cloud-of-Clouds Storage SystemsFrontiers of Networking Technologies10.1007/978-981-97-3890-8_4(48-60)Online publication date: 10-Jul-2024
  • (2017)Cloud storage reliability for Big Data applicationsJournal of Network and Computer Applications10.1016/j.jnca.2017.08.01197:C(35-47)Online publication date: 1-Nov-2017

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media