Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1989323.1989438acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Apache hadoop goes realtime at Facebook

Published: 12 June 2011 Publication History
  • Get Citation Alerts
  • Abstract

    Facebook recently deployed Facebook Messages, its first ever user-facing application built on the Apache Hadoop platform. Apache HBase is a database-like layer built on Hadoop designed to support billions of messages per day. This paper describes the reasons why Facebook chose Hadoop and HBase over other systems such as Apache Cassandra and Voldemort and discusses the application's requirements for consistency, availability, partition tolerance, data model and scalability. We explore the enhancements made to Hadoop to make it a more effective realtime system, the tradeoffs we made while configuring the system, and how this solution has significant advantages over the sharded MySQL database scheme used in other applications at Facebook and many other web-scale companies. We discuss the motivations behind our design choices, the challenges that we face in day-to-day operations, and future capabilities and improvements still under development. We offer these observations on the deployment as a model for other companies who are contemplating a Hadoop-based solution over traditional sharded RDBMS deployments.

    References

    [1]
    Apache Hadoop. Available at http://hadoop.apache.org
    [2]
    Apache HDFS. Available at http://hadoop.apache.org/hdfs
    [3]
    Apache Hive. Available at http://hive.apache.org
    [4]
    Apache HBase. Available at http://hbase.apache.org
    [5]
    The Google File System. Available at http://labs.google.com/papers/gfs-sosp2003.pdf
    [6]
    MapReduce: Simplified Data Processing on Large Clusters. Available at http://labs.google.com/papers/mapreduce-osdi04.pdf
    [7]
    BigTable: A Distributed Storage System for Structured Data. Available at http://labs.google.com/papers/bigtable-osdi06.pdf
    [8]
    ZooKeeper: Wait-free coordination for Internet-scale systems. Available at http://www.usenix.org/events/usenix10/tech/full_papers/Hunt.pdf
    [9]
    Memcached. Available at http://en.wikipedia.org/wiki/Memcached
    [10]
    Scribe. Available at http://github.com/facebook/scribe/wiki
    [11]
    Building Realtime Insights. Available at http://www.facebook.com/note.php?note_id=10150103900258920
    [12]
    Seligstein, Joel. 2010. Facebook Messages. Available at http://www.facebook.com/blog.php?post=452288242130
    [13]
    Patrick O'Neil and Edward Cheng and Dieter Gawlick and Elizabeth O'Neil. The Log-Structured Merge-Tree (LSM-Tree) HDFS-1094. Available at http://issues.apache.org/jira/browse/HDFS-1094.
    [14]
    Facebook Chat. https://www.facebook.com/note.php?note_id=14218138919
    [15]
    Facebook has the world's largest Hadoop cluster! Available at http://hadoopblog.blogspot.com/2010/05/facebook-has-worlds-largest-hadoop.html
    [16]
    Fsck. Available at http://en.wikipedia.org/wiki/Fsck
    [17]
    FlashCache. Available at https://github.com/facebook/flashcache

    Cited By

    View all
    • (2024)Design and Application of Big Data Analysis Management Platform of Coalbed GasSignal and Information Processing, Networking and Computers10.1007/978-981-97-2116-0_10(80-88)Online publication date: 3-May-2024
    • (2023)Modernization of Databases in the Cloud Era: Building Databases that Run Like LegosProceedings of the VLDB Endowment10.14778/3611540.361163916:12(4140-4151)Online publication date: 1-Aug-2023
    • (2023)PolarDB-IMCI: A Cloud-Native HTAP Database System at AlibabaProceedings of the ACM on Management of Data10.1145/35897851:2(1-25)Online publication date: 20-Jun-2023
    • Show More Cited By

    Index Terms

    1. Apache hadoop goes realtime at Facebook

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      SIGMOD '11: Proceedings of the 2011 ACM SIGMOD International Conference on Management of data
      June 2011
      1364 pages
      ISBN:9781450306614
      DOI:10.1145/1989323
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 12 June 2011

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. HBase
      2. data
      3. distributed systems
      4. hadoop
      5. hive
      6. scalability
      7. scribe

      Qualifiers

      • Research-article

      Conference

      SIGMOD/PODS '11
      Sponsor:

      Acceptance Rates

      Overall Acceptance Rate 785 of 4,003 submissions, 20%

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)119
      • Downloads (Last 6 weeks)12

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)Design and Application of Big Data Analysis Management Platform of Coalbed GasSignal and Information Processing, Networking and Computers10.1007/978-981-97-2116-0_10(80-88)Online publication date: 3-May-2024
      • (2023)Modernization of Databases in the Cloud Era: Building Databases that Run Like LegosProceedings of the VLDB Endowment10.14778/3611540.361163916:12(4140-4151)Online publication date: 1-Aug-2023
      • (2023)PolarDB-IMCI: A Cloud-Native HTAP Database System at AlibabaProceedings of the ACM on Management of Data10.1145/35897851:2(1-25)Online publication date: 20-Jun-2023
      • (2023)Reliability Evaluation of Erasure-coded Storage Systems with Latent ErrorsACM Transactions on Storage10.1145/356831319:1(1-47)Online publication date: 11-Jan-2023
      • (2023)Sailfish: A Dependency-Aware and Resource Efficient Scheduling for Low Latency in Clouds2023 IEEE International Conference on Big Data (BigData)10.1109/BigData59044.2023.10386947(237-246)Online publication date: 15-Dec-2023
      • (2023)Investigations on optimizing performance of the distributed computing in heterogeneous environment using machine learning technique for large scale data setMaterials Today: Proceedings10.1016/j.matpr.2021.07.08980(2976-2982)Online publication date: 2023
      • (2023)Enhancing the Reliability of Cloud Data through Identifying Data Inconsistency between Cloud SystemsInformation Systems Frontiers10.1007/s10796-023-10405-6Online publication date: 29-Jun-2023
      • (2022)Data Locality in High Performance Computing, Big Data, and Converged Systems: An Analysis of the Cutting Edge and a Future System ArchitectureElectronics10.3390/electronics1201005312:1(53)Online publication date: 23-Dec-2022
      • (2022)Dynamic Performance Analysis of STEP System in Internet of Vehicles Based on Queuing TheoryComputational Intelligence and Neuroscience10.1155/2022/83220292022Online publication date: 10-Apr-2022
      • (2022)CRISP: critical slice prefetchingProceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems10.1145/3503222.3507745(300-313)Online publication date: 28-Feb-2022
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media