Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3219104.3229269acmotherconferencesArticle/Chapter ViewAbstractPublication PagespearcConference Proceedingsconference-collections
extended-abstract

Evaluating the Fault Tolerance Performance of HDFS and Ceph

Published: 22 July 2018 Publication History

Abstract

Large-scale distributed systems are a collection of loosely coupled computers interconnected by a communication network. They are now an integral part of everyday life with the development of large web applications, social networks, peer-to-peer systems, wireless sensor networks and many more. Because each disk by itself is prone to failure, one key challenge in designing such systems is their ability to tolerate faults. Hence, fault tolerance mechanisms such as replication are widely used to provide data availability at all times. On the other hand, many systems now are increasingly supporting new mechanism called erasure coding (EC), claiming that using EC provides high reliability at lower storage cost than replication. However, this comes at the cost of performance. Our goal in this paper is to compare the performance and storage requirements of these two data reliability techniques for two open source systems: HDFS and Ceph especially that the Apache Software Foundation had released a new version of Hadoop, Apache Hadoop 3.0.0, which now supports EC. In addition, with the Firefly release (May 2014) Ceph added support for EC as well. We tested replication vs. EC in both systems using several benchmarks shipped with these systems. Results show that there are trade-offs between replication and EC in terms of performance and storage requirements.

References

[1]
Rados Bench. {n. d.}. The Ceph Benchmarking Tool. https://github.com/ceph/cbt
[2]
Michael G. Noll. 2011. TestDFSIO Benchmark. http://www.michael-noll.com/blog
[3]
J. S. Plank. 2013. Erasure Codes for Storage Systems: A Brief Primer.; login: the Usenix magazine 38, 6 (December 2013).
[4]
K.V. Rashmi, Nihar B. Shah, Dikang Gu, Hairong Kuang, Dhruba Borthakur, and Kannan Ramchandran. 2014. A "Hitchhiker's" Guide to Fast and Efficient Data Reconstruction in Erasure-coded Data Centers. In Proceedings of the 2014 ACM Conference on SIGCOMM (SIGCOMM '14). ACM, New York, NY, USA, 331--342.
[5]
Robert Ricci, Eric Eide, and The CloudLab Team. 2014. Introducing CloudLab: Scientific Infrastructure for Advancing Cloud Architectures and Applications. USENIX;login: 39, 6 (Dec. 2014). https://www.usenix.org/publications/login/dec14/ricci
[6]
K. Shvachko, Hairong Kuang, S. Radia, and R. Chansler. 2010. The Hadoop Distributed File System. In Mass Storage Systems and Technologies (MSST), 2010 IEEE 26th Symposium on. 1 --10.
[7]
Hakim Weatherspoon and John Kubiatowicz. 2002. Erasure Coding Vs. Replication: A Quantitative Comparison. In Revised Papers from the First International Workshop on Peer-to-Peer Systems (IPTPS '01). Springer-Verlag, London, UK, UK, 328--338. http://dl.acm.org/citation.cfm?id=646334.687814
[8]
Sage Weil, Scott A. Brandt, Ethan L. Miller, Darrell D. E. Long, and Carlos Maltzahn. 2006. Ceph: A Scalable, High-Performance Distributed File System. In Proceedings of the 7th Conference on Operating Systems Design and Implementation (OSDI '06).

Cited By

View all
  • (2023)Efficient Management and Scheduling of Massive Remote Sensing Image DatasetsISPRS International Journal of Geo-Information10.3390/ijgi1205019912:5(199)Online publication date: 13-May-2023
  • (2021)A Ceph-based storage strategy for big gridded remote sensing dataBig Earth Data10.1080/20964471.2021.19897926:3(323-339)Online publication date: 27-Dec-2021

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
PEARC '18: Proceedings of the Practice and Experience on Advanced Research Computing: Seamless Creativity
July 2018
652 pages
ISBN:9781450364461
DOI:10.1145/3219104
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 22 July 2018

Check for updates

Author Tags

  1. Ceph
  2. Distributed Storage Systems
  3. Fault Tolerance
  4. HDFS
  5. Performance Evaluation

Qualifiers

  • Extended-abstract
  • Research
  • Refereed limited

Conference

PEARC '18

Acceptance Rates

PEARC '18 Paper Acceptance Rate 79 of 123 submissions, 64%;
Overall Acceptance Rate 133 of 202 submissions, 66%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)17
  • Downloads (Last 6 weeks)3
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Efficient Management and Scheduling of Massive Remote Sensing Image DatasetsISPRS International Journal of Geo-Information10.3390/ijgi1205019912:5(199)Online publication date: 13-May-2023
  • (2021)A Ceph-based storage strategy for big gridded remote sensing dataBig Earth Data10.1080/20964471.2021.19897926:3(323-339)Online publication date: 27-Dec-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media