Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2882903.2903738acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Ambry: LinkedIn's Scalable Geo-Distributed Object Store

Published: 14 June 2016 Publication History

Abstract

The infrastructure beneath a worldwide social network has to continually serve billions of variable-sized media objects such as photos, videos, and audio clips. These objects must be stored and served with low latency and high throughput by a system that is geo-distributed, highly scalable, and load-balanced. Existing file systems and object stores face several challenges when serving such large objects. We present Ambry, a production-quality system for storing large immutable data (called blobs). Ambry is designed in a decentralized way and leverages techniques such as logical blob grouping, asynchronous replication, rebalancing mechanisms, zero-cost failure detection, and OS caching. Ambry has been running in LinkedIn's production environment for the past 2 years, serving up to 10K requests per second across more than 400 million users. Our experimental evaluation reveals that Ambry offers high efficiency (utilizing up to 88% of the network bandwidth), low latency (less than 50 ms latency for a 1 MB object), and load balancing (improving imbalance of request rate among disks by 8x-10x).

References

[1]
Bonnie+. http://www.coker.com.au/bonnie+/, 2001 (accessed Mar, 2016).
[2]
A. Auradkar, C. Botev, S. Das, D. De Maagd, A. Feinberg, P. Ganti, L. Gao, B. Ghosh, K. Gopalakrishna, et al. Data infrastructure at LinkedIn. In Proceeding of the IEEE International Conference on Data Engineering (ICDE), 2012.
[3]
D. Beaver, S. Kumar, H. C. Li, J. Sobel, and P. Vajgel. Finding a needle in Haystack: Facebook's photo storage. In Proceeding of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2010.
[4]
B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, et al. Windows Azure storage: A highly available cloud storage service with strong consistency. In Proceeding of the ACM Symposium on Operating Systems Principles (SOSP), 2011.
[5]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS), 26(2), 2008.
[6]
B. F. Cooper, R. Ramakrishnan, U. Srivastava, A. Silberstein, P. Bohannon, A. Jacobsen, N. Puz, D. Weaver, and R. Yerneni. Pnuts: Yahoo!'s hosted data serving platform. In Proceeding of the Very Large Data Bases Endowment (VLDB), 1(2), 2008.
[7]
J. C. Corbett, J. Dean, M. Epstein, A. Fikes, C. Frost, J. J. Furman, S. Ghemawat, A. Gubarev, C. Heiser, et al. Spanner: Google\textquoterights globally-distributed database. In Proceeding of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2012.
[8]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceeding of the ACM SIGOPS Operating Systems Review (OSR), 2007.
[9]
G. R. Ganger and M. F. Kaashoek. Embedded inodes and explicit grouping: Exploiting disk bandwidth for small files. In Proceeding of the USENIX Annual Technical Conference (ATC), 1997.
[10]
S. Ghemawat, H. Gobioff, and S.-T. Leung. The Google File System. In Proceeding of the ACM SIGOPS Operating Systems Review (OSR), 2003.
[11]
Hortonworks. Ozone: An object store in HDFS. http://hortonworks.com/blog/ozone-object-store-hdfs/, 2014 (accessed Mar, 2016).
[12]
P. Hunt, M. Konar, F. P. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for internet-scale systems. In Proceeding of the USENIX Annual Technical Conference (ATC), 2010.
[13]
J. Kreps, N. Narkhede, J. Rao, et al. Kafka: A distributed messaging system for log processing. In Proceeding of the USENIX Networking Meets Databases Workshop (NetDB), 2011.
[14]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. In Proceeding of the ACM SIGOPS Operating Systems Review (OSR), number 2, 2010.
[15]
E. K. Lee and C. A. Thekkath. Petal: Distributed virtual disks. In Proceeding of the ACM Architectural Support for Programming Languages and Operating Systems (ASPLOS), 1996.
[16]
J. H. Morris, M. Satyanarayanan, M. H. Conner, J. H. Howard, D. S. Rosenthal, and F. D. Smith. Andrew: A distributed personal computing environment. Communications of the ACM (CACM), 29(3), 1986.
[17]
S. J. Mullender and A. S. Tanenbaum. Immediate files. Software: Practice and Experience, 14(4), 1984.
[18]
S. Muralidhar, W. Lloyd, S. Roy, C. Hill, E. Lin, W. Liu, S. Pan, S. Shankar, V. Sivakumar, et al. F4: Facebook's warm blob storage system. In Proceeding of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2014.
[19]
Oracle. Database securefiles and large objects developer's guide. https://docs.oracle.com/database/121/ADLOB/toc.htm, 2011(accessed Mar, 2016).
[20]
K. Ren, Q. Zheng, S. Patil, and G. Gibson. Indexfs: Scaling file system metadata performance with stateless caching and bulk insertion. In Proceeding of the IEEE High Performance Computing, Networking, Storage and Analysis (SC), 2014.
[21]
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Transactions on Computer Systems (TOCS), 10(1), 1992.
[22]
R. Sandberg, D. Goldberg, S. Kleiman, D. Walsh, and B. Lyon. Design and implementation of the Sun network file system. In Proceeding of the USENIX Summer Technical Conference, 1985.
[23]
M. Seltzer, K. Bostic, M. K. Mckusick, and C. Staelin. An implementation of a log-structured file system for UNIX. In Proceeding of the USENIX Winter Technical Conference, 1993.
[24]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop Distributed File System. In Proceeding of the IEEE Mass Storage Systems and Technologies (MSST), 2010.
[25]
D. Stancevic. Zero copy I: User-mode perspective. Linux Journal, 2003(105), 2003.
[26]
I. Stoica, R. Morris, D. Karger, M. F. Kaashoek, and H. Balakrishnan. Chord: A scalable peer-to-peer lookup service for Internet applications. In Proceeding of the ACM Special Interest Group on Data Communication (SIGCOMM), 2001.
[27]
Twitter. Blobstore: Twitter's in-house photo storage system. https://blog.twitter.com/2012/blobstore-twitter-s-in-house-photo-storage-system, 2011 (accessed Mar, 2016).
[28]
S. A. Weil, S. A. Brandt, E. L. Miller, D. D. Long, and C. Maltzahn. Ceph: A scalable, high-performance distributed file system. In Proceeding of the USENIX Symposium on Operating Systems Design and Implementation (OSDI), 2006.
[29]
S. A. Weil, S. A. Brandt, E. L. Miller, and C. Maltzahn. CRUSH: Controlled, scalable, decentralized placement of replicated data. In Proceeding of the IEEE High Performance Computing, Networking, Storage and Analysis (SC), 2006.
[30]
Z. Zhang and K. Ghose. hFS: A hybrid file system prototype for improving small file and metadata performance. In Proceeding of the ACM European Conference on Computer Systems (EuroSys), 2007.

Cited By

View all
  • (2023)Oasis: Controlling Data Migration in Expansion of Object-based Storage SystemsACM Transactions on Storage10.1145/356842419:1(1-22)Online publication date: 19-Jan-2023
  • (2021)Deep Reinforcement Learning for Joint Datacenter and HVAC Load Control in Distributed Mixed-Use BuildingsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2019.29105336:3(370-384)Online publication date: 1-Jul-2021
  • (2021)Optimistic Causal Consistency for Geo-Replicated Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.302677832:3(527-542)Online publication date: 1-Mar-2021
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '16: Proceedings of the 2016 International Conference on Management of Data
June 2016
2300 pages
ISBN:9781450335317
DOI:10.1145/2882903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2016

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. geographically distributed
  2. load balancing
  3. object store
  4. scalable

Qualifiers

  • Research-article

Conference

SIGMOD/PODS'16
Sponsor:
SIGMOD/PODS'16: International Conference on Management of Data
June 26 - July 1, 2016
California, San Francisco, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)9
Reflects downloads up to 14 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2023)Oasis: Controlling Data Migration in Expansion of Object-based Storage SystemsACM Transactions on Storage10.1145/356842419:1(1-22)Online publication date: 19-Jan-2023
  • (2021)Deep Reinforcement Learning for Joint Datacenter and HVAC Load Control in Distributed Mixed-Use BuildingsIEEE Transactions on Sustainable Computing10.1109/TSUSC.2019.29105336:3(370-384)Online publication date: 1-Jul-2021
  • (2021)Optimistic Causal Consistency for Geo-Replicated Key-Value StoresIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2020.302677832:3(527-542)Online publication date: 1-Mar-2021
  • (2021)Data Migration Rate of CRUSH-Based Distributed Object Storage with Dynamic TopologyDistributed Computer and Communication Networks: Control, Computation, Communications10.1007/978-3-030-66242-4_36(464-471)Online publication date: 5-Jan-2021
  • (2020)MAPXProceedings of the 18th USENIX Conference on File and Storage Technologies10.5555/3386691.3386693(1-12)Online publication date: 24-Feb-2020
  • (2020)Mass: Workload-Aware Storage Policy for OpenStack SwiftProceedings of the 49th International Conference on Parallel Processing10.1145/3404397.3404427(1-11)Online publication date: 17-Aug-2020
  • (2020)Iris: Amortized, Resource Efficient Visualizations of Voluminous Spatiotemporal Datasets2020 IEEE/ACM International Conference on Big Data Computing, Applications and Technologies (BDCAT)10.1109/BDCAT50828.2020.00003(47-56)Online publication date: Dec-2020
  • (2020)HGeoHashBase: an optimized storage model of spatial objects for location-based servicesFrontiers of Computer Science: Selected Publications from Chinese Universities10.1007/s11704-018-7030-314:1(208-218)Online publication date: 1-Feb-2020
  • (2020)OStoreBench: Benchmarking Distributed Object Storage Systems Using Real-World Application ScenariosBenchmarking, Measuring, and Optimizing10.1007/978-3-030-71058-3_6(90-105)Online publication date: 15-Nov-2020
  • (2019)Size-aware sharding for improving tail latencies in in-memory key-value storesProceedings of the 16th USENIX Conference on Networked Systems Design and Implementation10.5555/3323234.3323242(79-93)Online publication date: 26-Feb-2019
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media