Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Building Efficient Key-Value Stores via a Lightweight Compaction Tree

Published: 24 November 2017 Publication History

Abstract

Log-Structure Merge tree (LSM-tree) has been one of the mainstream indexes in key-value systems supporting a variety of write-intensive Internet applications in today’s data centers. However, the performance of LSM-tree is seriously hampered by constantly occurring compaction procedures, which incur significant write amplification and degrade the write throughput. To alleviate the performance degradation caused by compactions, we introduce a lightweight compaction tree (LWC-tree), a variant of LSM-tree index optimized for minimizing the write amplification and maximizing the system throughput. The lightweight compaction drastically decreases write amplification by appending data in a table and only merging the metadata that have much smaller size. Using our proposed LWC-tree, we have implemented three key-value LWC-stores on different storage mediums including Shingled Magnetic Recording (SMR) drives, Solid State Drives (SSD), and conventional Hard Disk Drives (HDDs). The LWC-store is particularly optimized for SMR drives, as it eliminates the multiplicative I/O amplification from both LSM-trees and SMR drives. Due to the lightweight compaction procedure, LWC-store reduces the write amplification by a factor of up to 5× compared to the popular LevelDB key-value store. Moreover, the random write throughput of the LWC-tree on SMR drives is significantly improved by up to 467% even compared with LevelDB on conventional HDDs. Furthermore, LWC-tree has wide applicability and delivers impressive performance improvement in various conditions, including different storage mediums (i.e., SMR, HDD, SSD) and various value sizes and access patterns (i.e., uniform and Zipfian).

References

[1]
Abutalib Aghayev and Peter Desnoyers. 2015. Skylight a window on shingled disk operation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15).
[2]
Abutalib Aghayev, Theodore Tso, Garth Gibson, and Peter Desnoyers. 2017. Evolving Ext4 for shingled disks. In Proceedings of 15th USENIX Conference on File and Storage Technologies (FAST’17), Vol. 1. 105.
[3]
Jung-Sang Ahn, Chiyoung Seo, Ravi Mayuram, Rahim Yaseen, Jin-Soo Kim, and Seungryoul Maeng. 2016. ForestDB: A fast key-value storage system for variable-length string keys. IEEE Trans. Comput. 65, 3 (2016), 902--915.
[4]
Ahmed Amer, Darrell D. E. Long, Ethan L. Miller, Jehan-Francois Paris, and S. J. Thomas Schwarz. 2010. Design issues for a shingled write disk system. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10).
[5]
Apache. 2007. HBase. Retrieved from http://hbase.apache.org/.
[6]
Berk Atikoglu, Yuehai Xu, Eitan Frachtenberg, Song Jiang, and Mike Paleczny. 2012. Workload analysis of a large-scale key-value store. ACM Sigmetr. Perf. Eval. Rev. 40, 1 (2012), 53--64.
[7]
Yuval Cassuto, Marco A. A. Sanvido, Cyril Guyot, David R. Hall, and Zvonimir Z. Bandic. 2010. Indirection systems for shingled-recording disk drives. In Proceedings of the 2010 IEEE 26th Symposium on Mass Storage Systems and Technologies (MSST’10). IEEE, 1--14.
[8]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 205--218.
[9]
Brian F. Cooper, Raghu Ramakrishnan, Utkarsh Srivastava, Adam Silberstein, Philip Bohannon, Hans-Arno Jacobsen, Nick Puz, Daniel Weaver, and Ramana Yerneni. 2008. PNUTS: Yahoo!s hosted data serving platform. In Proceedings of the VLDB Endowment (PVLDB’08).
[10]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10).
[11]
Biplob Debnath, Sudipta Sengupta, and Jin Li. 2010. FlashStore: High throughput persistent key-value store. Proc. VLDB Endow. 3, 1--2 (2010), 1414--1425.
[12]
Biplob Debnath, Sudipta Sengupta, and Jin Li. 2011. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of data. ACM, 25--36.
[13]
Facebook. 2016. RocksDB, A persistent key-value store for fast storage enviroments. Retrieved from http://rocksdb.org/.
[14]
Tim Feldman and Garth Gibson. 2013. Shingled magnetic recording: Areal density increase requires new data management. USENIX 38, 3 (2013), 22--30.
[15]
Brad Fitzpatrick and Anatoly Vorobey. 2011. Memcached: A distributed memory object caching system. https://memcached.org/.
[16]
Sanjay Ghemawat and Jeff Dean. 2016. LevelDB. Retrieved from https://github.com/Level/leveldown/issues/298.
[17]
Garth Gibson and Greg Ganger. 2011. Principles of operation for shingled disk devices. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-107 (2011).
[18]
Weiping He and David H. C. Du. 2017. SMaRT: An approach to shingled magnetic recording translation. In Proceedings of the 15th USENIX Conference on File and Storage Technologies (FAST’17). 121.
[19]
Ping Huang, Pradeep Subedi, Xubin He, Shuang He, and Ke Zhou. 2014. FlexECC: Partially relaxing ECC of MLC SSD for better cache performance. In Proceedings of the USENIX Annual Technical Conference. 489--500.
[20]
Ping Huang, Guanying Wu, Xubin He, and Weijun Xiao. 2014. An aggressive worn-out flash block management scheme to alleviate SSD performance degradation. In Proceedings of the 9th European Conference on Computer Systems. ACM, 22.
[21]
Chunbo Lai, Song Jiang, Liqiong Yang, Shiding Lin, Guangyu Sun, Zhenyu Hou, Can Cui, and Jason Cong. 2015. Atlas: Baidu’s key-value storage system for cloud data. In Proceedings of the 2015 31st Symposium on Mass Storage Systems and Technologies (MSST’15). 1--14.
[22]
Avinash Lakshman and Prashant Malik. 2009. Cassandra: A decentralized structured storage system. In Proceedings of the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware.
[23]
Conglong Li and Alan L. Cox. 2015. GD-wheel: A cost-aware replacement policy for key-value stores. In Proceedings of the 10th European Conference on Computer Systems. 5.
[24]
Hyeontaek Lim, Bin Fan, David G. Andersen, and Michael Kaminsky. 2011. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles. 1--13.
[25]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 133--148.
[26]
Adam Manzanares, Noah Watkins, Cyril Guyot, Damien LeMoal, Carlos Maltzahn, and Zvonimr Bandic. 2016. ZEA, A data management approach for SMR. In Proceedings of the 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’16).
[27]
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. 2014. NVMKV: A scalable and lightweight flash aware key-value store. In Proceedings of the 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage’14).
[28]
Chris Nyberg, Tom Barclay, Zarka Cvetanovic, Jim Gray, and Dave Lomet. 1994. AlphaSort: A RISC machine sort. In Proceedings of the 1994 ACM SIGMOD International Conference on Management of Data (SIGMOD’94).
[29]
Patrick ONeil, Edward Cheng, Dieter Gawlick, and Elizabeth ONeil. 1996. The log-structured merge-tree (LSM-tree). Acta Inform. 33, 4 (1996), 351--385.
[30]
Rekha Pichumani, James Hughes, and Ethan L. Miller. 2015. SMRDB: Key-value data store for shingled magnetic recording disks. In Proceedings of the ACM International Systems and Storage Conference (SYSTOR’15).
[31]
Salvatore Sanfilippo and Pieter Noordhuis. 2009. Redis. Retrieved from http://redis.io/.
[32]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data (SIGMOD’12).
[33]
I. Tagawa and M. Williams. 2009. High density data-storage using shingled write. In Proceedings of the IEEE International Magnetics Conference (INTERMAG’09).
[34]
Hua Wang, Ping Huang, Shuang He, Ke Zhou, Chunhua Li, and Xubin He. 2013. A novel I/O scheduler for SSD with improved performance and lifetime. In Proceedings of the 2013 IEEE 29th Symposium on Mass Storage Systems and Technologies (MSST’13). IEEE, 1--5.
[35]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In Proceedings of the 9th European Conference on Computer Systems. 16:1--16:14.
[36]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key-value store for small data. In Proceedings of the USENIX Annual Technical Conference (USENIX’15).
[37]
Xingbo Wu, Li Zhang, Yandong Wang, Yufei Ren, Michel Hack, and Song Jiang. 2016. zExpander: A key-value cache with both high performance and fewer misses. In Proceedings of the 11th European Conference on Computer Systems. ACM, 14.
[38]
Jingpei Yang, Ned Plasson, Greg Gillis, and Nisha Talagala. 2013. HEC: Improving endurance of high performance flash-based cache devices. In Proceedings of the 6th International Systems and Storage Conference. ACM, 10.
[39]
Yinliang Yue, Bingsheng He, Yuzhe Li, and Weiping Wang. 2017. Building an efficient put-intensive key-value store with skip-tree. IEEE Transactions on Parallel and Distributed Systems 28, 4. IEEE.
[40]
Ke Zhou, Shaofu Hu, Ping Huang, and Yuhong Zhao. 2017. LX-SSD: Enhancing the lifespan of NAND flash-based memory via recycling invalid pages. In Proceedings of the 2017 IEEE 33rd Symposium on Massive Storage Systems and Technology (MSST’17).
[41]
You Zhou, Fei Wu, Ping Huang, Xubin He, Changsheng Xie, and Jian Zhou. 2015. An efficient page-level ftl to optimize address translation in flash memory. In Proceedings of the 10th European Conference on Computer Systems. ACM, 12.

Cited By

View all
  • (2024)HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673153(453-463)Online publication date: 12-Aug-2024
  • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
  • (2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 13, Issue 4
Special Issue on MSST 2017 and Regular Papers
November 2017
329 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3160863
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 November 2017
Accepted: 01 September 2017
Received: 01 August 2017
Published in TOS Volume 13, Issue 4

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. LSM-tree
  2. Lightweight compaction
  3. SMR drives
  4. write amplification

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)67
  • Downloads (Last 6 weeks)1
Reflects downloads up to 13 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)HyperDB: a Novel Key Value Store for Reducing Background Traffic in Heterogeneous SSD StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673153(453-463)Online publication date: 12-Aug-2024
  • (2024)CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage Disaggregated InfrastructureProceedings of the ACM on Management of Data10.1145/36549272:3(1-28)Online publication date: 30-May-2024
  • (2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
  • (2024)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 30-Jan-2024
  • (2024)SplitDB: Closing the Performance Gap for LSM-Tree-Based Key-Value StoresIEEE Transactions on Computers10.1109/TC.2023.332698273:1(206-220)Online publication date: 1-Jan-2024
  • (2024)SolsDB: Solve the Ethereum’s bottleneck caused by storage engineFuture Generation Computer Systems10.1016/j.future.2024.05.050160(295-304)Online publication date: Nov-2024
  • (2024)MTDB: an LSM-tree-based key-value store using a multi-tree structure to improve read performanceThe Journal of Supercomputing10.1007/s11227-024-06382-5Online publication date: 1-Aug-2024
  • (2023)MyWAL: performance optimization by removing redundant input/output stack in key-value storeMyWAL: 一种基于精简输入输出堆栈的键值存储系统性能优化方案Frontiers of Information Technology & Electronic Engineering10.1631/FITEE.220049624:7(980-993)Online publication date: 28-Jul-2023
  • (2023)Learning to Optimize LSM-trees: Towards A Reinforcement Learning based Key-Value Store for Dynamic WorkloadsProceedings of the ACM on Management of Data10.1145/36173331:3(1-25)Online publication date: 13-Nov-2023
  • (2023)Tidal-Tree-Mem: Toward Read-Intensive Key-Value Stores With Tidal Structure Based on LSM-TreeIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2022.317757542:2(423-436)Online publication date: Feb-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media