Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2213836.2213947acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Walnut: a unified cloud object store

Published: 20 May 2012 Publication History

Abstract

Walnut is an object-store being developed at Yahoo! with the goal of serving as a common low-level storage layer for a variety of cloud data management systems including Hadoop (a MapReduce system), MObStor (a multimedia serving system), and PNUTS (an extended key-value serving system). Thus, a key performance challenge is to meet the latency and throughput requirements of the wide range of workloads commonly observed across these diverse systems. The motivation for Walnut is to leverage a carefully optimized low-level storage system, with support for elasticity and high-availability, across all of Yahoo!'s data clouds. This would enable sharing of hardware resources across hitherto siloed clouds of different types, offering greater potential for intelligent load balancing and efficient elastic operation, and simplify the operational tasks related to data storage.
In this paper, we discuss the motivation for unifying different storage clouds, describe the requirements of a common storage layer, and present the Walnut design, which uses a quorum-based replication protocol and one-hop direct client access to the data in most regular operations. A unique contribution of Walnut is its hybrid object strategy, which efficiently supports both small and large objects. We present experiments based on both synthetic and real data traces, showing that Walnut works well over a wide range of workloads, and can indeed serve as a common low-level storage layer across a range of cloud systems.

References

[1]
B. Calder, J. Wang, A. Ogus, N. Nilakantan, A. Skjolsvold, S. McKelvie, Y. Xu, S. Srivastav, J. Wu, H. Simitci, J. Haridas, C. Uddaraju, H. Khatri, A. Edwards, V. Bedekar, S. Mainali, R. Abbasi, A. Agarwal, M. F. ul Haq, M. I. ul Haq, D. Bhardwaj, S. Dayanand, A. Adusumilli, M. McNett, S. Sankaran, K. Manivannan, and L. Rigas. Windows Azure storage: a highly available cloud storage service with strong consistency. In SOSP, 2011.
[2]
D. G. Campbell, G. Kakivaya, and N. Ellis. Extreme scale with full SQL language support in Microsoft SQL Azure. In SIGMOD, 2010.
[3]
F. Chang et al. Bigtable: A distributed storage system for structured data. In OSDI, 2006.
[4]
B. F. Cooper et al. PNUTS: Yahoo!'s hosted data serving platform. Proc. VLDB Endow., 1(2), 2008.
[5]
B. F. Cooper, A. Silberstein, E. Tam, R. Ramakrishnan, and R. Sears. Benchmarking cloud serving systems with YCSB. In SoCC, 2010.
[6]
http://couchdb.apache.org.
[7]
J. Dean and S. Ghemawat. Mapreduce: Simplified data processing on large clusters. In OSDI, 2004.
[8]
G. DeCandia et al. Dynamo: Amazon's highly available key-value store. In SOSP, 2007.
[9]
B. Dees. Native command queuing-advanced performance in desktop storage. Potentials, IEEE, 24(4):4--7, 2005.
[10]
S. Ghemawat, H. Gobioff, and S. T. Leung. The Google file system. In SOSP, 2003.
[11]
http://hbase.apache.org/.
[12]
J. Howard, M. Kazar, S. Menees, D. Nichols, M. Satyanarayanan, R. Sidebotham, and M. West. Scale and performance in a distributed file system. ACM Transactions on Computer Systems (TOCS), 6(1):51--81, 1988.
[13]
P. Hunt, M. Konar, F. Junqueira, and B. Reed. Zookeeper: Wait-free coordination for Internet-scale systems. In USENIX ATC, 2010.
[14]
http://kosmosfs.googlecode.com/.
[15]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2), April 2010.
[16]
L. Lamport. Paxos made simple. SIGACT News, 2001.
[17]
S. Lee, B. Moon, and C. Park. Advances in flash memory SSD technology for enterprise database applications. In SIGMOD, 2009.
[18]
D. Lomet, A. Fekete, G. Weikum, and M. Zwilling. Megastore: Providing scalable, highly available storage for interactive services. In CIDR, 2011.
[19]
https://github.com/m1ch1/mapkeeper/.
[20]
http://mongodb.org.
[21]
M. Nelson, B. Welch, and J. Ousterhout. Caching in the sprite network file system. TOCS, 6(1):134--154, 1988.
[22]
J. Rao, E. J. Shekita, and S. Tata. Using Paxos to build a scalable, consistent, and highly available datastore. PVLDB., 4(4):243--254, 2011.
[23]
P. Schwan. Lustre: Building a file system for 1000-node clusters. In Linux Symposium, 2003.
[24]
R. Sears and R. Ramakrishnan. bLSM: A general purpose log structured merge tree. In SIGMOD, 2012.
[25]
K. Shvachko, H. Kuang, S. Radia, and R. Chansler. The Hadoop distributed file system. In MSST, 2010.
[26]
http://swift.openstack.org/.
[27]
R. Van Renesse and F. Schneider. Chain replication for supporting high throughput and availability. In OSDI, 2004.
[28]
S. Weil, A. Leung, S. Brandt, and C. Maltzahn. Rados: a scalable, reliable storage service for petabyte-scale storage clusters. In Workshop on Petascale Data Storage, 2007.
[29]
M. Widenius and D. Axmark. MySQL Manual.

Cited By

View all
  • (2022)LogStore: A Workload-Aware, Adaptable Key-Value Store on Hybrid Storage SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302719134:8(3867-3882)Online publication date: 1-Aug-2022
  • (2019)Transparent Throughput Elasticity for Modern Cloud StorageApplying Integration Techniques and Methods in Distributed Systems and Technologies10.4018/978-1-5225-8295-3.ch007(156-191)Online publication date: 2019
  • (2019)DirectLoad: A Fast Web-Scale Index System Across Large Regional Centers2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00195(1790-1801)Online publication date: Apr-2019
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '12: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data
May 2012
886 pages
ISBN:9781450312479
DOI:10.1145/2213836
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 20 May 2012

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. cloud storage
  2. hybrid object store
  3. paxos-based replication

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '12
Sponsor:

Acceptance Rates

SIGMOD '12 Paper Acceptance Rate 48 of 289 submissions, 17%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)18
  • Downloads (Last 6 weeks)3
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2022)LogStore: A Workload-Aware, Adaptable Key-Value Store on Hybrid Storage SystemsIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2020.302719134:8(3867-3882)Online publication date: 1-Aug-2022
  • (2019)Transparent Throughput Elasticity for Modern Cloud StorageApplying Integration Techniques and Methods in Distributed Systems and Technologies10.4018/978-1-5225-8295-3.ch007(156-191)Online publication date: 2019
  • (2019)DirectLoad: A Fast Web-Scale Index System Across Large Regional Centers2019 IEEE 35th International Conference on Data Engineering (ICDE)10.1109/ICDE.2019.00195(1790-1801)Online publication date: Apr-2019
  • (2018)Optimal Bloom Filters and Adaptive Merging for LSM-TreesACM Transactions on Database Systems10.1145/327698043:4(1-48)Online publication date: 8-Dec-2018
  • (2018)Size MattersProceedings of the 19th International Middleware Conference10.1145/3274808.3274811(26-39)Online publication date: 26-Nov-2018
  • (2018)DostoevskyProceedings of the 2018 International Conference on Management of Data10.1145/3183713.3196927(505-520)Online publication date: 27-May-2018
  • (2017)SlimDBProceedings of the VLDB Endowment10.14778/3151106.315110810:13(2037-2048)Online publication date: 1-Sep-2017
  • (2017)WiscKeyACM Transactions on Storage10.1145/303327313:1(1-28)Online publication date: 2-Mar-2017
  • (2017)Building an Efficient Put-Intensive Key-Value Store with Skip-TreeIEEE Transactions on Parallel and Distributed Systems10.1109/TPDS.2016.260991228:4(961-973)Online publication date: 1-Apr-2017
  • (2017)IoT-Based Big Data Storage Systems in Cloud Computing: Perspectives and ChallengesIEEE Internet of Things Journal10.1109/JIOT.2016.26193694:1(75-87)Online publication date: Feb-2017
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media