Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD

Published: 01 July 2023 Publication History

Abstract

We propose WALTZ, an LSM tree-based key-value store on the emerging Zoned Namespace (ZNS) SSD. The key contribution of WALTZ is to leverage the zone append command, which is a recent addition to ZNS SSD specifications, to provide tight tail latency. The long tail latency problem caused by the merging process of multiple parallel writes, called batch-group writes, is effectively addressed by the internal synchronization mechanism of ZNS SSD. To provide fast failover when the active zone becomes full for a write-ahead log (WAL) file during parallel append, WALTZ introduces a mechanism for WAL zone replacement and reservation. Finally, lazy metadata management allows a put query to be processed fast without requiring any other synchronizations to enable lock-free execution of individual append commands. For evaluation we use both mi-crobenchmarks (db_bench) with varying read/write ratios and key skewnesses, and realistic social-graph workloads (MixGraph from Facebook). Our evaluation demonstrates geomean reduction of tail latency by 2.19× and 2.45× for db_bench and MixGraph, respectively, with a maximum reduction of 3.02× and 4.73×. As a side effect of eliminating the overhead of batch-group writes, WALTZ also improves the query throughput (QPS) by up to 11.7%.

References

[1]
Apache. [online]. Apache Cassandra. https://cassandra.apache.org. [Accessed 25-07-2023].
[2]
Apache. [online]. Apache HBase. https://hbase.apache.org. [Accessed 25-07-2023].
[3]
Hanyeoreum Bae, Jiseon Kim, Miryeong Kwon, and Myoungsoo Jung. 2022. What you can't forget: exploiting parallelism for zoned namespaces. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 79--85.
[4]
Oana Balmau, Florin Dinu, Willy Zwaenepoel, Karan Gupta, Ravishankar Chandhiramoorthi, and Diego Didona. 2019. SILK: Preventing Latency Spikes in Log-Structured Merge Key-Value Stores. In 2019 USENIX Annual Technical Conference (USENIX ATC 19). USENIX Association, Renton, WA, 753--766. https://www.usenix.org/conference/atc19/presentation/balmau
[5]
Matias Bjørling. 2019. From Open-Channel SSDs to Zoned Namespaces. In Linux Storage and File systems Conference (Vault'19).
[6]
Matias Bjørling. 2020. Zone Append: A New Way of Writing to Zoned Storage. In Linux Storage and File systems Conference (Vault'20).
[7]
Matias Bjørling, Abutalib Aghayev, Hans Holmberg, Aravind Ramesh, Damien Le Moal, Gregory R. Ganger, and George Amvrosiadis. 2021. ZNS: Avoiding the Block Interface Tax for Flash-based SSDs. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). USENIX Association, 689--703. https://www.usenix.org/conference/atc21/presentation/bjorling
[8]
Matias Bjørling, Javier Gonzalez, and Philippe Bonnet. 2017. LightNVM: The Linux Open-Channel SSD Subsystem. In 15th USENIX Conference on File and Storage Technologies (FAST 17). USENIX Association, Santa Clara, CA, 359--374. https://www.usenix.org/conference/fast17/technical-sessions/presentation/bjorling
[9]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). USENIX Association, Santa Clara, CA, 209--223. https://www.usenix.org/conference/fast20/presentation/cao-zhichao
[10]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Deborah A Wallach, Mike Burrows, Tushar Chandra, Andrew Fikes, and Robert E Gruber. 2008. Bigtable: A distributed storage system for structured data. ACM Transactions on Computer Systems (TOCS) 26, 2 (2008), 1--26.
[11]
Feng Chen, Tian Luo, and Xiaodong Zhang. 2011. CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of Flash Memory based Solid State Drives. In 9th USENIX Conference on File and Storage Technologies (FAST 11), Vol. 11. USENIX Association, 77--90.
[12]
Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. 2021. SpanDB: A Fast, Cost-Effective LSM-tree Based KV Store on Hybrid Storage. In 19th USENIX Conference on File and Storage Technologies (FAST 21). USENIX Association, 17--32. https://www.usenix.org/conference/fast21/presentation/chen-hao
[13]
Gunhee Choi, Kwanghee Lee, Myunghoon Oh, Jongmoo Choi, Jhuyeong Jhin, and Yongseok Oh. 2020. A New LSM-style Garbage Collection Scheme for ZNS SSDs. In 12th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 20). USENIX Association. https://www.usenix.org/conference/hotstorage20/presentation/choi
[14]
Intel Corporation. [online]. Storage Performance Development Kit (SPDK). https://spdk.io/. [Accessed 25-07-2023].
[15]
Western Digital Corporation. [online]. libzbd. https://github.com/westerndigitalcorporation/libzbd. [Accessed 25-07-2023].
[16]
Western Digital Corporation. [online]. Zoned Storage. https://zonedstorage.io/. [Accessed 25-07-2023].
[17]
Jeffrey Dean and Luiz André Barroso. 2013. The tail at scale. Commun. ACM 56, 2 (2013), 74--80.
[18]
Samsung Electronics. [online]. Samsung Introduces Its First ZNS SSD With Maximized User Capacity and Enhanced Lifespan. https://news.samsung.com/global/samsung-introduces-its-first-zns-ssd-with-maximized-user-capacity-and-enhanced-lifespan. [Accessed 25-07-2023].
[19]
NVM express workgroup. [online]. NVMe Zoned Namespaces (ZNS) Command Set Specification. https://nvmexpress.org/specification/nvme-zoned-namespaces-zns-command-set-specification/. [Accessed 25-07-2023].
[20]
Facebook. [online]. Benchmarking tools. https://github.com/facebook/rocksdb/wiki/Benchmarking-tools. [Accessed 25-07-2023].
[21]
Facebook. [online]. A persistent key-value store for fast storage environments. http://rocksdb.org. [Accessed 25-07-2023].
[22]
Sanjay Ghemawat and Jeff Dean. [online]. LevelDB: A Fast Persistent Key-Value Store. https://github.com/google/leveldb. [Accessed 25-07-2023].
[23]
Simon Greaves, Yasushi Kanai, and Hiroaki Muraoka. 2009. Shingled Recording for 2--3 Tbit/in2. IEEE Transactions on Magnetics 45, 10 (2009), 3823--3829.
[24]
Aayush Gupta, Youngjae Kim, and Bhuvan Urgaonkar. 2009. DFTL: A Flash Translation Layer Employing Demand-Based Selective Caching of Page-Level Address Mappings. In Proceedings of the 14th International Conference on Architectural Support for Programming Languages and Operating Systems (Washington, DC, USA) (ASPLOS XIV). Association for Computing Machinery, New York, NY, USA, 229--240.
[25]
Mingzhe Hao, Gokul Soundararajan, Deepak Kenchammana-Hosekote, Andrew A. Chien, and Haryadi S. Gunawi. 2016. The Tail at Store: A Revelation from Millions of Hours of Disk and SSD Deployments. In 14th USENIX Conference on File and Storage Technologies (FAST 16). USENIX Association, Santa Clara, CA, 263--276. https://www.usenix.org/conference/fast16/technical-sessions/presentation/hao
[26]
HGST. 2017. Libzbc Version 5.4. 1. (2017).
[27]
Minwoo Im, Kyungsu Kang, and Heonyoung Yeom. 2022. Accelerating RocksDB for small-zone ZNS SSDs by parallel I/O mechanism. In Proceedings of the 23rd International Middleware Conference Industrial Track. 15--21.
[28]
Peiquan Jin, Xiangyu Zhuang, Yongping Luo, and Mingchen Lu. 2021. Exploring index structures for zoned namespaces SSDs. In 2021 IEEE International Conference on Big Data (Big Data). IEEE, 5919--5922.
[29]
Jeeyoon Jung and Dongkun Shin. 2022. Lifetime-leveling LSM-tree compaction for ZNS SSD. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 100--105.
[30]
Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The Multi-streamed Solid-State Drive. In 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14).
[31]
Jaeho Kim, Donghee Lee, and Sam H. Noh. 2015. Towards SLO Complying SSDs Through OPS Isolation. In 13th USENIX Conference on File and Storage Technologies (FAST 15). USENIX Association, Santa Clara, CA, 183--189. https://www.usenix.org/conference/fast15/technical-sessions/presentation/kim_jaeho
[32]
Damien Le Moal and Ting Yao. 2020. zonefs: Mapping POSIX File System Interface to Raw Zoned Block Device Accesses. (2020).
[33]
Hee-Rock Lee, Chang-Gyu Lee, Seungjin Lee, and Youngjae Kim. 2022. Compaction-aware zone allocation for LSM based key-value store on ZNS SSDs. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 93--99.
[34]
Baptiste Lepers, Oana Balmau, Karan Gupta, and Willy Zwaenepoel. 2019. Kvell: the design and implementation of a fast persistent key-value store. In Proceedings of the 27th ACM Symposium on Operating Systems Principles. 447--461.
[35]
Junkai Liang and Yunpeng Chai. 2021. CruiseDB: An LSM-Tree Key-Value Store with Both Better Tail Throughput and Tail Latency. In 2021 IEEE 37th International Conference on Data Engineering (ICDE). IEEE, 1032--1043.
[36]
Renping Liu, Zhenhua Tan, Yan Shen, Linbo Long, and Duo Liu. 2022. Fair-ZNS: Enhancing Fairness in ZNS SSDs through Self-balancing I/O Scheduling. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (2022).
[37]
Mingchen Lu, Chen Tang, and Peiquan Jin. 2022. Revisiting LSM-Tree-Based Key-Value Stores for ZNS SSDs. In 2022 IEEE International Conference on Big Data (Big Data). IEEE, 6772--6774.
[38]
Ziyi Lu, Qiang Cao, Hong Jiang, Shucheng Wang, and Yuanyuan Dong. 2022. P2KVS: A Portable 2-Dimensional Parallelizing Framework to Improve Scalability of Key-Value Stores on SSDs. In Proceedings of the Seventeenth European Conference on Computer Systems (Rennes, France) (EuroSys 22). Association for Computing Machinery, New York, NY, USA, 575--591.
[39]
Yanqi Lv, Peiquan Jin, Xiaoliang Wang, Ruicheng Liu, Liming Fang, Yuanjin Lin, and Kuankuan Guo. 2022. ZonedStore: A Concurrent ZNS-Aware Cache System for Cloud Data Storage. In 2022 IEEE 42nd International Conference on Distributed Computing Systems (ICDCS). IEEE, 1322--1325.
[40]
Hojin Shin, Myounghoon Oh, Gunhee Choi, and Jongmoo Choi. 2020. Exploring performance characteristics of ZNS SSDs: Observation and implication. In 2020 9th Non-Volatile Memory Systems and Applications Symposium (NVMSA). IEEE, 1--5.
[41]
Denghui Wu, Biyong Liu, Wei Zhao, and Wei Tong. 2022. ZNSKV: Reducing Data Migration in LSMT-Based KV Stores on ZNS SSDs. In 2022 IEEE 40th International Conference on Computer Design (ICCD). IEEE, 411--414.
[42]
Ting Yao, Yiwen Zhang, Jiguang Wan, Qiu Cui, Liu Tang, Hong Jiang, Changsheng Xie, and Xubin He. 2020. MatrixKV: Reducing Write Stalls and Write Amplification in LSM-tree Based KV Stores with Matrix Container in NVM. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). USENIX Association, 17--31. https://www.usenix.org/conference/atc20/presentation/yao
[43]
Yiwen Zhang, Ting Yao, Jiguang Wan, and Changsheng Xie. 2022. Building GC-free key-value store on HM-SMR drives with ZoneFS. ACM Transactions on Storage (TOS) 18, 3 (2022), 1--23.

Cited By

View all
  • (2025)Constructing a scalable key-value store engine on multidisk systemThe Journal of Supercomputing10.1007/s11227-024-06725-281:1Online publication date: 1-Jan-2025
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • (2024)SchInFS: A File System Integrating Functions of the Block I/O Scheduler for ZNS SSDs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00057(324-331)Online publication date: 18-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 16, Issue 11
July 2023
789 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2023
Published in PVLDB Volume 16, Issue 11

Check for updates

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)355
  • Downloads (Last 6 weeks)20
Reflects downloads up to 05 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Constructing a scalable key-value store engine on multidisk systemThe Journal of Supercomputing10.1007/s11227-024-06725-281:1Online publication date: 1-Jan-2025
  • (2024)Bf-Tree: A Modern Read-Write-Optimized Concurrent Larger-Than-Memory Range IndexProceedings of the VLDB Endowment10.14778/3681954.368201217:11(3442-3455)Online publication date: 1-Jul-2024
  • (2024)SchInFS: A File System Integrating Functions of the Block I/O Scheduler for ZNS SSDs2024 IEEE 42nd International Conference on Computer Design (ICCD)10.1109/ICCD63220.2024.00057(324-331)Online publication date: 18-Nov-2024
  • (2024)Efficient Data Placement in Deduplication Enabled ZenFS via CRC-Based PredictionIEEE Access10.1109/ACCESS.2024.352018412(197233-197246)Online publication date: 2024

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media