Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

FlatLSM: Write-Optimized LSM-Tree for PM-Based KV Stores

Published: 06 March 2023 Publication History

Abstract

The Log-Structured Merge Tree (LSM-Tree) is widely used in key-value (KV) stores because of its excwrite performance. But LSM-Tree-based KV stores still have the overhead of write-ahead log and write stall caused by slow L0 flush and L0-L1 compaction. New byte-addressable, persistent memory (PM) devices bring an opportunity to improve the write performance of LSM-Tree. Previous studies on PM-based LSM-Tree have not fully exploited PM’s “dual role” of main memory and external storage. In this article, we analyze two strategies of memtables based on PM and the reasons write stall problems occur in the first place. Inspired by the analysis result, we propose FlatLSM, a specially designed flat LSM-Tree for non-volatile memory based KV stores. First, we propose PMTable with separated index and data. The PM Log utilizes the Buffer Log to store KVs of size less than 256B. Second, to solve the write stall problem, FlatLSM merges the volatile memtables and the persistent L0 into large PMTables, which can reduce the depth of LSM-Tree and concentrate I/O bandwidth on L0-L1 compaction. To mitigate write stall caused by flushing large PMTables to SSD, we propose a parallel flush/compaction algorithm based on KV separation. We implemented FlatLSM based on RocksDB and evaluated its performance on Intel’s latest PM device, the Intel Optane DC PMM with the state-of-the-art PM-based LSM-Tree KV stores, FlatLSM improves the throughput 5.2× on random write workload and 2.55× on YCSB-A.

References

[1]
SNIA NVM Programming Technical Working Group. 2017. NVM Programming Model (Version 1.2). SNIA NVM Programming Technical Working Group.
[2]
2018. Titan: A RocksDB Plugin to Reduce Write Amplification. Retrieved April 25, 2019 from https://pingcap.com/blog/titan-storage-engine-design-and-implementation.
[3]
Apache. 2014. HBase. Retrieved January 30, 2023 from https://hbase.apache.org/.
[4]
Joy Arulraj, Andrew Pavlo, and Subramanya R. Dulloor. 2015. Let’s talk about storage & recovery methods for non-volatile memory database systems. In Proceedings of the 2015 ACM SIGMOD International Conference on Management of Data. 707–722.
[5]
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 363–375.
[6]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing latency spikes in log-structured merge key-value stores. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’19). 753–766.
[7]
Doug Beaver, Sanjeev Kumar, Harry C. Li, Jason Sobel, and Peter Vajgel. 2010. Finding a needle in haystack: Facebook’s photo storage. In Proceedings of the 9th USENIX Conference on Operating Systems Design and Implementation (OSDI’10). 1–8.
[8]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E. Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems 26, 2 (2008), 1–26.
[9]
Youmin Chen, Youyou Lu, Fan Yang, Qing Wang, Yang Wang, and Jiwu Shu. 2020. FlatStore: An efficient log-structured key-value storage engine for persistent memory. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems. 1077–1091.
[10]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the 1st ACM Symposium on Cloud Computing. 143–154.
[11]
Intel Corporation. 2019. Intel Optane DC Persistent Memory Product Brief. Retrieved January 30, 2023 from https://www.intel.com/content/www/us/en/products/docs/memory-storage/optane-persistent-memory/optane-dc-persistent-memory-brief.html.
[12]
Intel Corporation. 2019. Persistent Memory Development Kit. Retrieved January 30, 2023 from https://pmem.io/pmdk/.
[13]
Niv Dayan and Stratos Idreos. 2019. The log-structured merge-bush & the wacky continuum. In Proceedings of the 2019 International Conference on Management of Data. 449–466.
[14]
Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels. 2007. Dynamo: Amazon’s highly available key-value store. ACM SIGOPS Operating Systems Review 41, 6 (2007), 205–220.
[15]
Assaf Eisenman, Darryl Gardner, Islam AbdelRahman, Jens Axboe, Siying Dong, Kim Hazelwood, Chris Petersen, Asaf Cidon, and Sachin Katti. 2018. Reducing DRAM footprint with NVM in Facebook. In Proceedings of the 13th EuroSys Conference. 1–13.
[16]
Facebook. 2008. Cassandra. Retrieved January 30, 2023 from https://cassandra.apache.org/.
[17]
Facebook. 2013. RocksDB. Retrieved January 30, 2023 from https://rocksdb.org/.
[18]
Eran Gilad, Edward Bortnikov, Anastasia Braginsky, Yonatan Gottesman, Eshcar Hillel, Idit Keidar, Nurit Moscovici, and Rana Shahout. 2020. EvenDB: Optimizing key-value storage for spatial locality. In Proceedings of the 15th European Conference on Computer Systems. 1–16.
[19]
Google. 2011. LevelDB. Retrieved January 30, 2023 from https://github.com/google/leveldb.
[20]
Deukyeon Hwang, Wook-Hee Kim, Youjip Won, and Beomseok Nam. 2018. Endurable transient inconsistency in byte-addressable persistent B+-tree. In Proceedings of the 16th USENIX Conference on File and Storage Technologies (FAST’18). 187–200.
[21]
Intel. 2019. What Is Intel Optane DC Persistent Memory? Retrieved January 30, 2023 from https://www.boston.co.uk/blog/2019/07/10/intel-optane-dc-persistant-memory.aspx.
[22]
Joseph Izraelevitz, Jian Yang, Lu Zhang, Juno Kim, Xiao Liu, Amirsaman Memaripour, Yun Joon Soh, et al. 2019. Basic performance measurements of the Intel Optane DC persistent memory module. arXiv preprint arXiv:1903.05714 (2019).
[23]
Olzhas Kaiyrakhmet, Songyi Lee, Beomseok Nam, Sam H. Noh, and Young-Ri Choi. 2019. SLM-DB: Single-level key-value store with persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 191–205.
[24]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In Proceedings of the 2018 USENIX Annual Technical Conference (USENIX ATC’18). 993–1005.
[25]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. KVell: The design and implementation of a fast persistent key-value store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 447–461.
[26]
Jianhong Li, Andrew Pavlo, and Siying Dong. 2017. NVMRocks: RocksDB on non-volatile memory systems.
[27]
Yongkun Li, Zhen Liu, Patrick P. C. Lee, Jiayu Wu, Yinlong Xu, Yi Wu, Liu Tang, Qi Liu, and Qiu Cui. 2021. Differentiated key-value storage management for balanced IO performance. In Proceedings of the 2021 USENIX Annual Technical Conference (USENIX ATC’21). 673–687.
[28]
Zhan Lin, Lu Kai, Zhilong Cheng, and Jiguang Wan. 2020. RangeKV: An efficient key-value store based on hybrid DRAM-NVM-SSD storage structure. IEEE Access 8 (2020), 154518–154529.
[29]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Hariharan Gopalakrishnan, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2017. WiscKey: Separating keys from values in SSD-conscious storage. ACM Transactions on Storage 13, 1 (2017), 1–28.
[30]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In Proceedings of the 7th ACM European Conference on Computer Systems. 183–196.
[31]
Fei Mei, Qiang Cao, Hong Jiang, and Jingjun Li. 2018. SifrDB: A unified solution for write-optimized key-value stores in large datacenter. In Proceedings of the ACM Symposium on Cloud Computing. 477–489.
[32]
Moohyeon Nam, Hokeun Cha, Young-Ri Choi, Sam H. Noh, and Beomseok Nam. 2019. Write-optimized dynamic hashing for persistent memory. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 31–44.
[33]
Ismail Oukid, Johan Lasperas, Anisoara Nica, Thomas Willhalm, and Wolfgang Lehner. 2016. FPTree: A hybrid SCM-DRAM persistent and concurrent B-tree for storage class memory. In Proceedings of the 2016 International Conference on Management of Data. 371–386.
[34]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-Tree). Acta Informatica 33, 4 (1996), 351–385.
[35]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. PebblesDB: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles. 497–514.
[36]
Simone Raoux, Geoffrey W. Burr, Matthew J. Breitwisch, Charles T. Rettner, Y.-C. Chen, Robert M. Shelby, Martin Salinga, et al. 2008. Phase-change random access memory: A scalable technology. IBM Journal of Research and Development 52, 4.5 (2008), 465–479.
[37]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data. 217–228.
[38]
Fei Xia, Dejun Jiang, Jin Xiong, and Ninghui Sun. 2017. HiKV: A hybrid index key-value store for DRAM-NVM memory systems. In Proceedings of the 2017 USENIX Annual Technical Conference (USENIX ATC’17). 349–362.
[39]
Cong Xu, Dimin Niu, Naveen Muralimanohar, Rajeev Balasubramonian, Tao Zhang, Shimeng Yu, and Yuan Xie. 2015. Overcoming the challenges of crossbar resistive memory architectures. In Proceedings of the 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA’15). IEEE, Los Alamitos, CA, 476–488.
[40]
Baoyue Yan, Xuntao Cheng, Bo Jiang, Shibin Chen, Canfang Shang, Jianying Wang, Gui Huang, Xinjun Yang, Wei Cao, and Feifei Li. 2021. Revisiting the design of LSM-Tree based OLTP storage engine with persistent memory. Proceedings of the VLDB Endowment 14, 10 (2021), 1872–1885.
[41]
Jian Yang, Juno Kim, Morteza Hoseinzadeh, Joseph Izraelevitz, and Steve Swanson. 2020. An empirical guide to the behavior and use of scalable persistent memory. In Proceedings of the 18th USENIX Conference on File and Storage Technologies (FAST’20). 169–182. https://www.usenix.org/conference/fast20/presentation/yang.
[42]
Ting Yao, Jiguang Wan, Ping Huang, Yiwen Zhang, Zhiwen Liu, Changsheng Xie, and Xubin He. 2019. GearDB: A GC-free key-value store on HM-SMR drives with gear compaction. In Proceedings of the 17th USENIX Conference on File and Storage Technologies (FAST’19). 159–171.
[43]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing write stalls and write amplification in LSM-Tree based KV stores with matrix container in NVM. In Proceedings of the 2020 USENIX Annual Technical Conference (USENIX ATC’20). 17–31.
[44]
Pengfei Zuo, Yu Hua, and Jie Wu. 2018. Write-optimized and high-performance hashing index scheme for persistent memory. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 461–476.

Cited By

View all
  • (2024)A Vehicle Matching Algorithm by Maximizing Travel Time Probability Based on Automatic License Plate Recognition DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335862525:8(9103-9114)Online publication date: 1-Aug-2024
  • (2024)Time-tired compactionAdvanced Engineering Informatics10.1016/j.aei.2023.10222459:COnline publication date: 2-Jul-2024
  • (2024)Redesigning Data and Metadata Updates in PM File Systems with Persistent CPU CachesDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_33(453-462)Online publication date: 2-Jul-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 19, Issue 2
May 2023
269 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3585541
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 March 2023
Online AM: 21 January 2023
Accepted: 22 November 2022
Revised: 26 September 2022
Received: 26 April 2022
Published in TOS Volume 19, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Persistent memory
  2. key-value stores
  3. LSM-Tree

Qualifiers

  • Research-article

Funding Sources

  • NSF of China
  • Fundamental Research Funds for the Central Universities

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)489
  • Downloads (Last 6 weeks)32
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)A Vehicle Matching Algorithm by Maximizing Travel Time Probability Based on Automatic License Plate Recognition DataIEEE Transactions on Intelligent Transportation Systems10.1109/TITS.2024.335862525:8(9103-9114)Online publication date: 1-Aug-2024
  • (2024)Time-tired compactionAdvanced Engineering Informatics10.1016/j.aei.2023.10222459:COnline publication date: 2-Jul-2024
  • (2024)Redesigning Data and Metadata Updates in PM File Systems with Persistent CPU CachesDatabase Systems for Advanced Applications10.1007/978-981-97-5572-1_33(453-462)Online publication date: 2-Jul-2024

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media