extended-abstract

Evaluating the Fault Tolerance Performance of HDFS and Ceph

Authors:

Yehia Arafa,

Atanu Barai,

Mai Zheng,

Abdel-Hameed A. BadawyAuthors Info & Claims

PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity

Article No.: 88, Pages 1 - 3

https://doi.org/10.1145/3219104.3229269

Published: 22 July 2018 Publication History

Get Access

Abstract

Large-scale distributed systems are a collection of loosely coupled computers interconnected by a communication network. They are now an integral part of everyday life with the development of large web applications, social networks, peer-to-peer systems, wireless sensor networks and many more. Because each disk by itself is prone to failure, one key challenge in designing such systems is their ability to tolerate faults. Hence, fault tolerance mechanisms such as replication are widely used to provide data availability at all times. On the other hand, many systems now are increasingly supporting new mechanism called erasure coding (EC), claiming that using EC provides high reliability at lower storage cost than replication. However, this comes at the cost of performance. Our goal in this paper is to compare the performance and storage requirements of these two data reliability techniques for two open source systems: HDFS and Ceph especially that the Apache Software Foundation had released a new version of Hadoop, Apache Hadoop 3.0.0, which now supports EC. In addition, with the Firefly release (May 2014) Ceph added support for EC as well. We tested replication vs. EC in both systems using several benchmarks shipped with these systems. Results show that there are trade-offs between replication and EC in terms of performance and storage requirements.

References

[1]

Rados Bench. {n. d.}. The Ceph Benchmarking Tool. https://github.com/ceph/cbt

Google Scholar

[2]

Michael G. Noll. 2011. TestDFSIO Benchmark. http://www.michael-noll.com/blog

Google Scholar

[3]

J. S. Plank. 2013. Erasure Codes for Storage Systems: A Brief Primer.; login: the Usenix magazine 38, 6 (December 2013).

Google Scholar

[4]

K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 331--342.

Digital Library

Google Scholar

[5]

Robert Ricci, Eric Eide, and The CloudLab Team. 2014. Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications. USENIX;login: 39, 6 (Dec. 2014). https://www.usenix.org/publications/login/dec14/ricci

Google Scholar

[6]

K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 1 --10.

Digital Library

Google Scholar

[7]

Hakim Weatherspoon and John Kubiatowicz. 2002. Erasure Coding Vs. Replication: A Quantitative Comparison. In Revised Papers from the First International Workshop on Peer-to-Peer Systems (IPTPS '01). Springer-Verlag, London, UK, UK, 328--338. http://dl.acm.org/citation.cfm?id=646334.687814

Digital Library

Google Scholar

[8]

Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06).

Digital Library

Google Scholar

Cited By

View all

Zhu JZhang ZZhao FSu HGu ZWang L(2023)Efficient Management and Scheduling of Massive Remote Sensing Image DatasetsISPRS International Journal of Geo-Information10.3390/ijgi1205019912:5(199)Online publication date: 13-May-2023
https://doi.org/10.3390/ijgi12050199
Tang XYao XLiu DZhao LLi LZhu DLi G(2021)A Ceph-based storage strategy for big gridded remote sensing dataBig Earth Data10.1080/20964471.2021.19897926:3(323-339)Online publication date: 27-Dec-2021
https://doi.org/10.1080/20964471.2021.1989792

Index Terms

Evaluating the Fault Tolerance Performance of HDFS and Ceph
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks

Recommendations

Fault Tolerant Erasure Coded Replication for HDFS Based Cloud Storage
BDCLOUD '14: Proceedings of the 2014 IEEE Fourth International Conference on Big Data and Cloud Computing

Businesses and individuals move their data to the cloud because fault-tolerant data storage is becoming more important. Currently fault-tolerance cloud storage file systems are available and being used widely. Hadoop Distributed File System (HDFS) has ...
Critical Study of Performance Parameters on Distributed File Systems Using MapReduce

There is a lot of data generated by the network is growing every day. MapReduce is a promising parallel programming model for processing large data. In this paper we surveyed several distributed storage and computation systems. We have studied various ...
Unifying HDFS and GPFS: Enabling Analytics on Software-Defined Storage
Middleware '16: Proceedings of the 17th International Middleware Conference

Distributed file systems built for Big Data Analytics and cluster file systems built for traditional applications have very different functionality requirements, resulting in separate storage silos. In enterprises, there is often the need to run ...

Comments

Information & Contributors

Information

Published In

PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity

July 2018

652 pages

ISBN:9781450364461

DOI:10.1145/3219104

General Chair:
Sergiu Sanielevici
Pittsburgh Supercomputing Center

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2018

Check for updates

Author Tags

Qualifiers

Extended-abstract
Research
Refereed limited

Conference

PEARC '18

PEARC '18: Practice and Experience in Advanced Research Computing

July 22 - 26, 2018

PA, Pittsburgh, USA

Acceptance Rates

PEARC '18 Paper Acceptance Rate 79 of 123 submissions, 64%;

Overall Acceptance Rate 133 of 202 submissions, 66%

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
197
Total Downloads

Downloads (Last 12 months)17
Downloads (Last 6 weeks)3

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Zhu JZhang ZZhao FSu HGu ZWang L(2023)Efficient Management and Scheduling of Massive Remote Sensing Image DatasetsISPRS International Journal of Geo-Information10.3390/ijgi1205019912:5(199)Online publication date: 13-May-2023
https://doi.org/10.3390/ijgi12050199
Tang XYao XLiu DZhao LLi LZhu DLi G(2021)A Ceph-based storage strategy for big gridded remote sensing dataBig Earth Data10.1080/20964471.2021.19897926:3(323-339)Online publication date: 27-Dec-2021
https://doi.org/10.1080/20964471.2021.1989792

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Cited By

Index Terms

Recommendations

Fault Tolerant Erasure Coded Replication for HDFS Based Cloud Storage

Critical Study of Performance Parameters on Distributed File Systems Using MapReduce

Unifying HDFS and GPFS: Enabling Analytics on Software-Defined Storage

Comments

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Other Metrics

Article Metrics

Other Metrics

Cited By

Login options

Full Access

PDF

eReader

Abstract

References

Cited By

Index Terms

Recommendations

Fault Tolerant Erasure Coded Replication for HDFS Based Cloud Storage

Critical Study of Performance Parameters on Distributed File Systems Using MapReduce

Unifying HDFS and GPFS: Enabling Analytics on Software-Defined Storage

Comments

Information

Published In

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

Get Access

Login options

Full Access

View options

PDF

eReader

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations