Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3035918.3064054acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Public Access

Monkey: Optimal Navigable Key-Value Store

Published: 09 May 2017 Publication History

Abstract

In this paper, we show that key-value stores backed by an LSM-tree exhibit an intrinsic trade-off between lookup cost, update cost, and main memory footprint, yet all existing designs expose a suboptimal and difficult to tune trade-off among these metrics. We pinpoint the problem to the fact that all modern key-value stores suboptimally co-tune the merge policy, the buffer size, and the Bloom filters' false positive rates in each level.
We present Monkey, an LSM-based key-value store that strikes the optimal balance between the costs of updates and lookups with any given main memory budget. The insight is that worst-case lookup cost is proportional to the sum of the false positive rates of the Bloom filters across all levels of the LSM-tree. Contrary to state-of-the-art key-value stores that assign a fixed number of bits-per-element to all Bloom filters, Monkey allocates memory to filters across different levels so as to minimize this sum. We show analytically that Monkey reduces the asymptotic complexity of the worst-case lookup I/O cost, and we verify empirically using an implementation on top of LevelDB that Monkey reduces lookup latency by an increasing margin as the data volume grows (50%-80% for the data sizes we experimented with). Furthermore, we map the LSM-tree design space onto a closed-form model that enables co-tuning the merge policy, the buffer size and the filters' false positive rates to trade among lookup cost, update cost and/or main memory, depending on the workload (proportion of lookups and updates), the dataset (number and size of entries), and the underlying hardware (main memory available, disk vs. flash). We show how to use this model to answer what-if design questions about how changes in environmental parameters impact performance and how to adapt the various LSM-tree design elements accordingly.

References

[1]
M. Y. Ahmad and B. Kemme. Compaction management in distributed key-value datastores. PVLDB, 8(8):850--861, 2015.
[2]
M. R. Anderson, D. Antenucci, V. Bittorf, M. Burgess, M. J. Cafarella, A. Kumar, F. Niu, Y. Park, C. Ré, and C. Zhang. Brainwash: A Data System for Feature Engineering. In CIDR, 2013.
[3]
Apache. Accumulo. https://accumulo.apache.org/.
[4]
Apache. Cassandra. http://cassandra.apache.org.
[5]
Apache. HBase. http://hbase.apache.org/.
[6]
T. G. Armstrong, V. Ponnekanti, D. Borthakur, and M. Callaghan. LinkBench: a Database Benchmark Based on the Facebook Social Graph. In SIGMOD, 2013.
[7]
M. Athanassoulis and S. Idreos. Design Tradeoffs of Data Access Methods. In SIGMOD, 2016.
[8]
M. Athanassoulis, M. S. Kester, L. M. Maas, R. Stoica, S. Idreos, A. Ailamaki, and M. Callaghan. Designing Access Methods: The RUM Conjecture. In EDBT, 2016.
[9]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-Oblivious Streaming B-trees. In SPAA, 2007.
[10]
B. H. Bloom. Space/Time Trade-offs in Hash Coding with Allowable Errors. CACM, 13(7):422--426, 1970.
[11]
N. Bronson, Z. Amsden, G. Cabrera, P. Chakka, P. Dimov, H. Ding, J. Ferris, A. Giardullo, S. Kulkarni, H. C. Li, M. Marchukov, D. Petrov, L. Puzar, Y. J. Song, and V. Venkataramani. TAO: Facebook's Distributed Data Store for the Social Graph. In ATC, 2013.
[12]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A Distributed Storage System for Structured Data. In OSDI, 2006.
[13]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's Highly Available Key-value Store. SIGOPS Op. Sys. Rev., 41(6):205--220, 2007.
[14]
S. Dong, M. Callaghan, L. Galanis, D. Borthakur, T. Savor, and M. Strum. Optimizing Space Amplification in RocksDB. In CIDR, 2017.
[15]
Facebook. RocksDB. https://github.com/facebook/rocksdb.
[16]
Facebook. MyRocks. http://myrocks.io/.
[17]
B. Fitzpatrick and A. Vorobey. Memcached: a distributed memory object caching system, 2011.
[18]
G. Golan-Gueta, E. Bortnikov, E. Hillel, and I. Keidar. Scaling Concurrent Log-Structured Data Stores. In EuroSys, 2015.
[19]
Google. LevelDB. https://github.com/google/leveldb/.
[20]
B. C. Kuszmaul. A Comparison of Fractal Trees to Log-Structured Merge (LSM) Trees. Tokutek White Paper, 2014.
[21]
A. Lakshman and P. Malik. Cassandra - A Decentralized Structured Storage System. SIGOPS Op. Sys. Rev., 44(2):35--40, 2010.
[22]
Y. Li, B. He, J. Yang, Q. Luo, K. Yi, and R. J. Yang. Tree Indexing on Solid State Drives. PVLDB, 3(1--2):1195--1206, 2010.
[23]
H. Lim, D. G. Andersen, and M. Kaminsky. Towards Accurate and Fast Evaluation of Multi-Stage Log-structured Designs. In FAST, 2016.
[24]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A Memory-Efficient, High-Performance Key-Value Store. In SOSP, 2011.
[25]
LinkedIn. Online reference. http://www.project-voldemort.com.
[26]
L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau. WiscKey: Separating Keys from Values in SSD-conscious Storage. In FAST, 2016.
[27]
P. E. O'Neil, E. Cheng, D. Gawlick, and E. J. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, 1996.
[28]
Redis. Online reference. http://redis.io/.
[29]
R. Sears and R. Ramakrishnan. bLSM: A General Purpose Log Structured Merge Tree. In SIGMOD, 2012.
[30]
P. Shetty, R. P. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building Workload-Independent Storage with VT-trees. In FAST, 2013.
[31]
SQLite4. Online reference. https://sqlite.org/src4/.
[32]
S. Tarkoma, C. E. Rothenberg, and E. Lagerspetz. Theory and Practice of Bloom Filters for Distributed Systems. IEEE Communications Serveys & Tutorials, 14(1):131--155, 2012.
[33]
D. Tsirogiannis, S. Harizopoulos, and M. A. Shah. Analyzing the energy efficiency of a database server. In SIGMOD, 2010.
[34]
WiredTiger. WiredTiger. https://github.com/wiredtiger/wiredtiger.
[35]
X. Wu, Y. Xu, Z. Shao, and S. Jiang. LSM-trie: An LSM-tree-based Ultra-Large Key-Value Store for Small Data Items. In ATC, 2015.

Cited By

View all
  • (2024)FairyWRENProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691978(745-764)Online publication date: 10-Jul-2024
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '17: Proceedings of the 2017 ACM International Conference on Management of Data
May 2017
1810 pages
ISBN:9781450341974
DOI:10.1145/3035918
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 09 May 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. adaptivity
  2. auto-tuning
  3. bloom filters
  4. key-value store
  5. log-structured merge-tree
  6. lsm-tree
  7. memory hierarchy
  8. point lookups
  9. point queries
  10. read/write/memory trade-off

Qualifiers

  • Research-article

Funding Sources

Conference

SIGMOD/PODS'17
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)472
  • Downloads (Last 6 weeks)81
Reflects downloads up to 02 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)FairyWRENProceedings of the 18th USENIX Conference on Operating Systems Design and Implementation10.5555/3691938.3691978(745-764)Online publication date: 10-Jul-2024
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • (2024)Aleph Filter: To Infinity in Constant TimeProceedings of the VLDB Endowment10.14778/3681954.368202717:11(3644-3656)Online publication date: 1-Jul-2024
  • (2024)Optimizing Collections of Bloom Filters within a Space BudgetProceedings of the VLDB Endowment10.14778/3681954.368202017:11(3551-3564)Online publication date: 1-Jul-2024
  • (2024)FluidKV: Seamlessly Bridging the Gap between Indexing Performance and Memory-Footprint on Ultra-Fast StorageProceedings of the VLDB Endowment10.14778/3648160.364817717:6(1377-1390)Online publication date: 1-Feb-2024
  • (2024)SepHash: A Write-Optimized Hash Index On Disaggregated Memory via Separate Segment StructureProceedings of the VLDB Endowment10.14778/3641204.364121817:5(1091-1104)Online publication date: 1-Jan-2024
  • (2024)LSMGraph: A High-Performance Dynamic Graph Storage System with Multi-Level CSRProceedings of the ACM on Management of Data10.1145/36988182:6(1-28)Online publication date: 20-Dec-2024
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Adaptive Quotient FiltersProceedings of the ACM on Management of Data10.1145/36771282:4(1-28)Online publication date: 30-Sep-2024
  • (2024)Competitive Data-Structure DynamizationACM Transactions on Algorithms10.1145/367261420:4(1-28)Online publication date: 28-Jun-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media