Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
article

ZWAL: RethinkingWrite-ahead Logs for ZNS SSDs with Zone Appends

Published: 14 August 2024 Publication History

Abstract

KV-stores are extensively used databases that require performance stability. Zoned Namespace (ZNS) is an emerging interface for flash storage devices that provides such stability. Due to their sequential write access patterns, LSM trees, ubiquitous data structures in KV stores, present a natural fit for the append-only ZNS interface. However, LSM-trees achieve limited write throughput on ZNS. This limitation is because the largest portion of LSM-tree writes are small writes for the write-ahead log (WAL) component of LSMtrees, and ZNS has limited performance for small write I/O. The ZNS-specific zone append operation presents a solution, enhancing the throughput of small sequential writes. Still, zone appends are challenging to utilize inWALs. The storage device is allowed to reorder the data of zone appends, which is not supported by WAL recovery. Therefore, we need to change the WAL design to support such reordering.

References

[1]
Shai Bergman, Niklas Cassel, Matias Bjørling, and Mark Silberstein. 2023. ZNSwap: un-Block your Swap. ACM Transactions on Storage 19, 2, 1--25.
[2]
Matias Bjørling. 2020. Zone Append: A New Way of Writing to Zoned Storage. https://www.usenix.org/conference/vault20/presentation/ bjorling. USENIX Vault (2020).
[3]
Matias Bjørling, Abutalib Aghayev, Hans Holmberg, Aravind Ramesh, Damien Le Moal, Gregory R Ganger, and George Amvrosiadis. 2021. ZNS: Avoiding the Block Interface Tax for Flash-based SSDs. In 2021 USENIX Annual Technical Conference (USENIX ATC 21). 689--703.
[4]
Matias Bjørling, Javier González, and Philippe Bonnet. 2017. Light- NVM: The Linux Open-Channel SSD Subsystem. In Proceedings of the 15th Usenix Conference on File and Storage Technologies (FAST'17). Santa clara, CA, USA, 359--373.
[5]
Zhichao Cao, Siying Dong, Sagar Vemuri, and David HC Du. 2020. Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebook. In 18th USENIX Conference on File and Storage Technologies (FAST 20). 209--223.
[6]
Hao Chen, Chaoyi Ruan, Cheng Li, Xiaosong Ma, and Yinlong Xu. 2021. SpanDB: A fast, Cost-Effective LSM-tree based KV store on hybrid storage. In 19th USENIX Conference on File and Storage Technologies (FAST 21). 17--32.
[7]
Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. arXiv:1512.01274 http://arxiv.org/abs/ 1512.01274
[8]
Alexander Conway, Abhishek Gupta, Vijay Chidambaram, Martin Farach-Colton, Richard Spillane, Amy Tai, and Rob Johnson. 2020. SplinterDB: Closing the Bandwidth Gap for NVMe Key-value Stores. In 2020 USENIX Annual Technical Conference (USENIX ATC 20). 49--63.
[9]
Brian F Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking Cloud Serving Systems with YCSB. In Proceedings of the 1st ACM symposium on Cloud computing. 143--154.
[10]
Diego Didona, Jonas Pfefferle, Nikolas Ioannou, Bernard Metzler, and Animesh Trivedi. 2022. Understanding Modern Storage APIs: A Systematic Study of Libaio, SPDK, and io_uring. In Proceedings of the 15th ACM International Conference on Systems and Storage (Haifa, Israel) (SYSTOR '22). Association for Computing Machinery, New York, NY, USA, 120--127. https://doi.org/10.1145/3534056.3534945
[11]
Krijn Doekemeijer, Nick Tehrany, Balakrishnan Chandrasekaran, Matias Bjørling, and Animesh Trivedi. 2023. Performance characterization of nvme flash devices with zoned namespaces (zns). In 2023 IEEE International Conference on Cluster Computing (CLUSTER). IEEE, 118--131.
[12]
Siying Dong, Andrew Kryczka, Yanqin Jin, and Michael Stumm. 2021. RocksDB: Evolution of Development Priorities in a Key-Value Store Serving Large-scale Applications. ACM Transactions on Storage (TOS) 17, 4 (2021), 1--32.
[13]
Ohad Rodeh et al. 2013. BTRFS: The Linux B-tree filesystem. ACM Transactions on Storage (TOS) (2013).
[14]
Facebook. Accessed: 2024-01--19. RocksDB. https://github.com/ facebook/rocksdb.
[15]
Kevin P Gaffney, Martin Prammer, Larry Brasfield, D Richard Hipp, Dan Kennedy, and Jignesh M Patel. 2022. SQLite: Past, Present, and Future. Proceedings of the VLDB Endowment 15, 12 (2022), 3535--3547.
[16]
Jin Yong Ha and Heon Young Yeom. 2023. zCeph: Achieving High Performance On Storage System Using Small Zoned ZNS SSD. In Proceedings of the 38th ACM/SIGAPP Symposium on Applied Computing. 1342--1351.
[17]
Jun He, Sudarsun Kannan, Andrea C Arpaci-Dusseau, and Remzi H Arpaci-Dusseau. 2017. The unwritten contract of solid state drives. In Proceedings of the twelfth European conference on computer systems. 127--144.
[18]
Yihe Huang, Matej Pavlovic, Virendra Marathe, Margo Seltzer, Tim Harris, and Steve Byan. 2018. Closing the Performance Gap Between Volatile and Persistent Key-Value Stores Using Cross-Referencing Logs. In 2018 USENIX Annual Technical Conference (USENIX ATC 18). 967-- 979.
[19]
Jens Axboe. Accessed: 2024-01--19. Fio. https://github.com/axboe/fio.
[20]
Kanchan Joshi, Anuj Gupta, Javier González, Ankit Kumar, Krishna Kanth Reddy, Arun George, Simon Lund, and Jens Axboe. 2024. I/O Passthru: Upstreaming a flexible and efficient I/O Path in Linux. In 22nd USENIX Conference on File and Storage Technologies (FAST 24). 107--121.
[21]
Jeeyoon Jung and Dongkun Shin. 2022. Lifetime-leveling LSM-tree Compaction for ZNS SSD. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 100--105.
[22]
Jeong-Uk Kang, Jeeseok Hyun, Hyunjoo Maeng, and Sangyeun Cho. 2014. The Multi-streamed Solid-State Drive. In 6th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 14).
[23]
Thomas Kim, Jekyeom Jeon, Nikhil Arora, Huaicheng Li, Michael Kaminsky, David G Andersen, Gregory R Ganger, George Amvrosiadis, and Matias Bjørling. 2023. RAIZN: Redundant Array of Independent Zoned Namespaces. In Proceedings of the 28th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Volume 2. 660--673.
[24]
Ana Klimovic, Christos Kozyrakis, Eno Thereska, Binu John, and Sanjeev Kumar. 2016. Flash Storage Disaggregation. In Proceedings of the Eleventh European Conference on Computer Systems. 1--15.
[25]
Ana Klimovic, Heiner Litz, and Christos Kozyrakis. 2017. Reflex: Remote flash= local flash. ACM SIGARCH Computer Architecture News 45, 1 (2017), 345--359.
[26]
Changman Lee, Dongho Sim, Joo-Young Hwang, and Sangyeun Cho. 2015. F2FS: A New File System for Flash Storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (Santa Clara, CA) (FAST'15). USENIX Association, USA, 273--286.
[27]
Hee-Rock Lee, Chang-Gyu Lee, Seungjin Lee, and Youngjae Kim. 2022. Compaction-aware zone allocation for LSM based key-value store on ZNS SSDs. In Proceedings of the 14th ACM Workshop on Hot Topics in Storage and File Systems. 93--99.
[28]
Jongsung Lee, Donguk Kim, and JaeWLee. 2023. WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSD. Proceedings of the VLDB Endowment 16, 11 (2023), 2884--2896.
[29]
Jinhong Li, Qiuping Wang, and Patrick PC Lee. 2022. Efficient LSMTree Key-Value Data Management on Hybrid SSD/HDD Zoned Storage. CoRR abs/2205.11753 (2022). https://doi.org/10.48550/ARXIV.2205. 11753 arXiv:cs.PF/2205.11753
[30]
NVMe Consortium. Accessed: 2024-01--19. NVMe® 2.0 Specification. https://nvmexpress.org/nvme-2-0-specifications-and-newtechnical- proposals/.
[31]
Myounghoon Oh, Seehwan Yoo, Jongmoo Choi, Jeongsu Park, and Chang-Eun Choi. 2023. ZenFS+: Nurturing Performance and Isolation to ZenFS. IEEE Access 11 (2023), 26344--26357.
[32]
Devashish Purandare, Pete Wilcox, Heiner Litz, and Shel Finkelstein. 2022. Append is Near: Log-based Data Management on ZNS SSDs. In 12th Annual Conference on Innovative Data Systems Research (CIDR'22).
[33]
Inho Song, Myounghoon Oh, Bryan Suk Joon Kim, Seehwan Yoo, Jaedong Lee, and Jongmoo Choi. 2023. ConfZNS: A Novel Emulator for Exploring Design Space of ZNS SSDs. In Proceedings of the 16th ACM International Conference on Systems and Storage. 71--82.
[34]
Peng Wang, Guangyu Sun, Song Jiang, Jian Ouyang, Shiding Lin, Chen Zhang, and Jason Cong. 2014. An Efficient Design and Implementation of LSM-Tree Based Key-Value Store on Open-Channel SSD. In Proceedings of the Ninth European Conference on Computer Systems (Amsterdam, The Netherlands) (EuroSys '14). Association for Computing Machinery, New York, NY, USA, Article 16, 14 pages. https://doi.org/10.1145/2592798.2592804
[35]
Qiuping Wang and Patrick PC Lee. 2023. ZapRAID: Toward High- Performance RAID for ZNS SSDs via Zone Append. In Proceedings of the 14th ACM SIGOPS Asia-Pacific Workshop on Systems. 24--29.
[36]
Western Digital. Accessed: 2024-01--29. ZenFS. https://github.com/ westerndigitalcorporation/zenfs.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM SIGOPS Operating Systems Review
ACM SIGOPS Operating Systems Review  Volume 58, Issue 1
SIGOPS
June 2024
60 pages
DOI:10.1145/3689051
Issue’s Table of Contents
Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 August 2024
Published in SIGOPS Volume 58, Issue 1

Check for updates

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 21
    Total Downloads
  • Downloads (Last 12 months)21
  • Downloads (Last 6 weeks)6
Reflects downloads up to 01 Jan 2025

Other Metrics

Citations

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media