Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Building GC-free Key-value Store on HM-SMR Drives with ZoneFS

Published: 24 August 2022 Publication History

Abstract

Host-managed shingled magnetic recording drives (HM-SMR) are advantageous in capacity to harness the explosive growth of data. For key-value (KV) stores based on log-structured merge trees (LSM-trees), the HM-SMR drive is an ideal solution owning to its capacity, predictable performance, and economical cost. However, building an LSM-tree-based KV store on HM-SMR drives presents severe challenges in maintaining the performance and space utilization efficiency due to the redundant cleaning processes for applications and storage devices (i.e., compaction and garbage collection). To eliminate the overhead of on-disk garbage collection (GC) and improve compaction efficiency, this article presents GearDB, a GC-free KV store tailored for HM-SMR drives. GearDB improves the write performance and space efficiency through three new techniques: a new on-disk data layout, compaction windows, and a novel gear compaction algorithm. We further augment the read performance of GearDB with a new SSTable layout and read ahead mechanism. We implement GearDB with LevelDB, and use zonefs to access a real HM-SMR drive. Our extensive experiments confirm that GearDB achieves both high performance and space efficiency, i.e., on average 1.7× and 1.5× better than LevelDB in random write and read, respectively, with up to 86.9% space efficiency.

References

[1]
Abutalib Aghayev and Peter Desnoyers. 2015. Skylight—A window on shingled disk operation. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 135–149.
[2]
Ahmed Amer, Darrell D. E. Long, Ethan L. Miller, Jehan-Francois Paris, and S. J. Thomas Schwarz. 2010. Design issues for a shingled write disk system. In Proceedings of the IEEE 26th Symposium on Massive Storage Systems and Technology (MSST’10).
[3]
Matias Bjørling. 2019. From open-channel SSDs to zoned namespaces. In Proceedings of the Linux Storage and Filesystems Conference (Vault’19). 1.
[4]
Yuval Cassuto, Marco A. A. Sanvido, Cyril Guyot, David R. Hall, and Zvonimir Z. Bandic. 2010. Indirection systems for shingled-recording disk drives. In Proceedings of the IEEE 26th Symposium on Massive Storage Systems and Technology (MSST’10). 1–14.
[5]
Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C. Hsieh, Deborah A. Wallach, Michael Burrows, Tushar Chandra, Andrew Fikes, and Robert Gruber. 2006. Bigtable: A distributed storage system for structured data. In Proceedings of the 7th Symposium on Operating Systems Design and Implementation (OSDI’06). 205–218.
[6]
Changho Choi. 2016. Increasing SSD Performance and Lifetime with Multi-stream Technology. Retrieved from https://www.snia.org/sites/default/files/DSI/2016/presentations/sec/ChanghoChoi_Increasing_SSD_Performance-rev.pdf.
[7]
Brian F. Cooper, Adam Silberstein, Erwin Tam, Raghu Ramakrishnan, and Russell Sears. 2010. Benchmarking cloud serving systems with YCSB. In Proceedings of the ACM Symposium on Cloud Computing (SOCC’10).
[8]
Niv Dayan, Manos Athanassoulis, and Stratos Idreos. 2017. Monkey: Optimal navigable key-value store. In Proceedings of the ACM International Conference on Management of Data. ACM, 79–94.
[9]
Western Digital. 2016. dm-zoned. Retrieved from https://github.com/hgst/dm-zoned-tools.
[10]
Western Digital. 2019. ZoneFS. Retrieved from https://github.com/damien-lemoal/zonefs-tools.
[11]
Facebook. [n.d.]. RocksDB, A Persistent Key-value Store for Fast Storage Enviroments. Retrieved from http://rocksdb.org/.
[12]
Tim Feldman and Garth Gibson. 2013. Shingled magnetic recording: Areal density increase requires new data management. USENIX; Login: Mag. 38, 3 (2013), 22–30.
[13]
Sanjay Ghemawat and Jeff Dean. 2016. LevelDB. Retrieved from https://github.com/Level/leveldown/issues/298.
[14]
Garth Gibson and Greg Ganger. 2011. Principles of operation for shingled disk devices. Carnegie Mellon University Parallel Data Lab Technical Report CMU-PDL-11-107 (2011).
[15]
Guy Golan-Gueta, Edward Bortnikov, Eshcar Hillel, and Idit Keidar. 2015. Scaling concurrent log-structured data stores. In Proceedings of the 10th European Conference on Computer Systems (EuroSys’15).
[16]
Javier González, Matias Bjørling, Seongno Lee, Charlie Dong, and Yiren Ronnie Huang. 2014. Application-driven flash translation layers on open-channel SSDs. In Proceedings of the Nonvolatile Memory Workshop (NVMW’14).
[17]
Western Digital Corporation Hans Holmberg. 2020. ZenFS, Zones and RocksDB—Who Likes to Take Out the Garbage Anyway? Retrieved from https://www.snia.org/educational-library/zenfs-zones-and-rocksdb-who-likes-take-out-garbage-anyway-2020.
[18]
HGST. 2015. HGST Delivers World’s First 10TB Enterprise HDD for Active Archive Applications. Retrieved from http://investor.wdc.com/news-releases/news-release-details/hgst-delivers-worlds-first-10tb-enterprise-hdd-active-archive.
[19]
HGST. 2017. Libzbc Version 5.4.1. Retrieved from https://github.com/hgst/libzbc.
[20]
HGST. 2017. Ultrastar Hs14—14TB 3.5 inch Helium Platform Enterprise SMR Hard Drive. Retrieved from https://www.hgst.com/products/hard-drives/ultrastar-hs14.
[21]
[22]
INCITS T10 Technical Committee. 2017. Information Technology-Zoned Block Commands (ZBC). Draft Standard T10/BSR INCITS 550, American National Standards Institute, Inc. Retrieved from http://www.t10.org/drafts.htm.
[23]
INCITS T13 Technical Committee. [n.d.]. Zoned-device ata Command Set (ZAC) Working Draft.
[24]
H. V. Jagadish, P. P. S. Narayan, Sridhar Seshadri, S. Sudarshan, and Rama Kanneganti. 1997. Incremental organization for data recording and warehousing. In Proceedings of the Conference on Very Large data Bases (VLDB’97). 16–25.
[25]
Chao Jin, Wei-Ya Xi, Zhi-Yong Ching, Feng Huo, and Chun-Teck Lim. 2014. HiSMRfs: A high performance file system for shingled storage array. In Proceedings of the IEEE 30th Symposium on Massive Storage Systems and Technology (MSST’14). IEEE, 1–6.
[26]
Saurabh Kadekodi, Swapnil Pimpale, and Garth A. Gibson. 2015. Caveat-Scriptor: Write anywhere shingled disks. In Proceedings of the 7th USENIX Workshop on HotStorage.
[27]
Kai Ren, Qing Zheng, Joy Arulraj, and Garth Gibson. 2017. SlimDB—A space-efficient key-value storage engine for semi-sorted data. Proc. VLDB Endow. 10, 13 (2017).
[28]
Sudarsun Kannan, Nitish Bhat, Ada Gavrilovska, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2018. Redesigning LSMs for nonvolatile memory with NoveLSM. In Proceedings of the USENIX Annual Technical Conference. 993–1005.
[29]
Taejin Kim, Sangwook Shane Hahn, Sungjin Lee, Jooyoung Hwang, Jongyoul Lee, and Jihong Kim. 2018. PCStream: Automatic stream allocation using program contexts. In Proceedings of the 10th USENIX Workshop on HotStorage.
[30]
S. P. M. Chi-Young Ku and Stephen P. Morgan. 2015. An SMR-aware append-only file system. In Proceedings of the Storage Developer Conference.
[31]
Avinash Lakshman and Prashant Malik. 2009. Cassandra: A decentralized structured storage system. In Proceedings of the 3rd ACM SIGOPS International Workshop on Large Scale Distributed Systems and Middleware.
[32]
Lanyue Lu, Thanumalayan Sankaranarayana Pillai, Andrea C. Arpaci-Dusseau, and Remzi H. Arpaci-Dusseau. 2016. WiscKey: Separating keys from values in SSD-conscious storage. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 133–148.
[33]
Changman Lee, Dongho Sim, Jooyoung Hwang, and Sangyeun Cho. 2015. F2FS: A new file system for flash storage. In Proceedings of the 13th USENIX Conference on File and Storage Technologies (FAST’15). 273–286.
[34]
Sungjin Lee, Ming Liu, Sang Woo Jun, Shuotao Xu, Jihong Kim, and Arvind Arvind. 2016. Application-managed flash. In Proceedings of the 14th USENIX Conference on File and Storage Technologies (FAST’16). 339–353.
[35]
Q. Luo and L. Zhang. 2015. Implement object storage with SMR-based key-value store. In Proceedings of the Storage Developer Conference.
[36]
Peter Macko, Xiongzi Ge, J. Kelley, D. Slik, et al. 2017. SMORE: A cold data object store for SMR drives. In Proceedings of the IEEE 33th Symposium on Massive Storage Systems and Technology (MSST’17).
[37]
Adam Manzanares, Noah Watkins, Cyril Guyot, Damien LeMoal, Carlos Maltzahn, and Zvonimr Bandic. 2016. ZEA, A data management approach for SMR. In Proceedings of the 8th USENIX Workshop on HotStorage.
[38]
Oana Balmau, Diego Didona, Rachid Guerraoui, Willy Zwaenepoel, Huapeng Yuan, Aashray Arora, Karan Gupta, and Pavan Konka. 2017. TRIAD: Creating synergies between memory, disk and log in log structured key-value stores. In Proceedings of the USENIX Annual Technical Conference.
[39]
Leonardo Marmol, Swaminathan Sundararaman, Nisha Talagala, Raju Rangaswami, Sushma Devendrappa, Bharath Ramsundar, and Sriram Ganesan. 2014. NVMKV: A scalable and lightweight flash aware key-value store. In Proceedings of the 6th USENIX Workshop on HotStorage.
[40]
Martin Maas, Tim Harris, Krste Asanovic, and John Kubiatowicz. 2015. Trash day: Coordinating garbage collection in distributed systems. In Proceedings of the Workshop on Hot Topics in Operating Systems (HotOS’15).
[41]
Damien Le Moal and Ting Yao. 2020. Zonefs: Mapping POSIX file system interface to raw zoned block device accesses. USENIX Association, Santa Clara, CA.
[42]
Patrick O’Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O’Neil. 1996. The log-structured merge-tree (LSM-tree). Acta Informatica 33, 4 (1996), 351–385.
[43]
Rekha Pitchumani, James Hughes, and Ethan L. Miller. 2015. SMRDB: Key-value data store for shingled magnetic recording disks. In Proceedings of the 8th ACM International Systems and Storage Conference.
[44]
Pandian Raju, Rohan Kadekodi, Vijay Chidambaram, and Ittai Abraham. 2017. Pebblesdb: Building key-value stores using fragmented log-structured merge trees. In Proceedings of the 26th Symposium on Operating Systems Principles (SOSP’17). ACM, 497–514.
[45]
Ohad Rodeh, Josef Bacik, and Chris Mason. 2013. BTRFS: The linux B-tree filesystem. ACM Trans. Stor. 9, 3 (2013), 1–32.
[48]
Russell Sears and Raghu Ramakrishnan. 2012. bLSM: A general purpose log structured merge tree. In Proceedings of the ACM SIGMOD International Conference on Management of Data (SIGMOD’12).
[49]
Pradeep Shetty, Richard P. Spillane, Ravikant Malpani, Binesh Andrews, Justin Seyster, and Erez Zadok. 2013. Building workload-independent storage with VT-trees. In Proceedings of the 11th USENIX Conference on File and Storage Technologies (FAST’13). 17–30.
[50]
Fenggang Wu, Ming-Chang Yang, Ziqi Fan, Baoquan Zhang, Xiongzi Ge, and David H. C. Du. 2016. Evaluating host aware SMR drives. In Proceedings of the 8th USENIX Workshop on HotStorage.
[51]
Xingbo Wu, Yuehai Xu, Zili Shao, and Song Jiang. 2015. LSM-trie: An LSM-tree-based ultra-large key- value store for small data. In Proceedings of the USENIX Annual Technical Conference.
[52]
Ting Yao, Zhihu Tan, Jiguang Wan, Ping Huang, Yiwen Zhang, Changsheng Xie, and Xubin He. 2018. A set-aware key-value store on shingled magnetic recording drives with dynamic band. In Proceedings of the IEEE International Parallel and Distributed Processing Symposium (IPDPS’18). IEEE, 306–315.
[53]
Ting Yao, Jiguang Wan, Ping Huang, Xubin He, Qingxin Gui, Fei Wu, and Changsheng Xie. 2017. A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In Proceedings of the IEEE 33rd Symposium on Massive Storage Systems and Technology (MSST’17).

Cited By

View all
  • (2024)XFall: Domain Adaptive Wi-Fi-Based Fall Detection With Cross-Modal SupervisionIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.341399742:9(2457-2471)Online publication date: 14-Jun-2024
  • (2024)Dynamic zone redistribution for key-value stores on zoned namespaces SSDsJournal of Systems Architecture10.1016/j.sysarc.2024.103159152(103159)Online publication date: Jul-2024
  • (2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Storage
ACM Transactions on Storage  Volume 18, Issue 3
August 2022
244 pages
ISSN:1553-3077
EISSN:1553-3093
DOI:10.1145/3555792
  • Editor:
  • Sam H. Noh
Issue’s Table of Contents

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 24 August 2022
Online AM: 22 July 2022
Accepted: 01 November 2021
Revised: 01 October 2021
Received: 01 December 2020
Published in TOS Volume 18, Issue 3

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Key-value store
  2. host-managed SMR
  3. LSM-Tree
  4. garbage collection
  5. gear compaction

Qualifiers

  • Research-article
  • Refereed

Funding Sources

  • Creative Reaserch Group Project of NSFC
  • National Natural Science Foundation of China
  • National Key Research and Development Program of China

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)186
  • Downloads (Last 6 weeks)14
Reflects downloads up to 22 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)XFall: Domain Adaptive Wi-Fi-Based Fall Detection With Cross-Modal SupervisionIEEE Journal on Selected Areas in Communications10.1109/JSAC.2024.341399742:9(2457-2471)Online publication date: 14-Jun-2024
  • (2024)Dynamic zone redistribution for key-value stores on zoned namespaces SSDsJournal of Systems Architecture10.1016/j.sysarc.2024.103159152(103159)Online publication date: Jul-2024
  • (2024)UnifiedSC: a unified framework via collaborative optimization for multi-task person re-identificationApplied Intelligence10.1007/s10489-024-05333-054:4(2962-2975)Online publication date: 22-Feb-2024
  • (2023)WALTZ: Leveraging Zone Append to Tighten the Tail Latency of LSM Tree on ZNS SSDProceedings of the VLDB Endowment10.14778/3611479.361149516:11(2884-2896)Online publication date: 1-Jul-2023
  • (2023)YuYin: a multi-task learning model of multi-modal e-commerce background music recommendationEURASIP Journal on Audio, Speech, and Music Processing10.1186/s13636-023-00306-62023:1Online publication date: 19-Oct-2023
  • (2023)A Feature Map is Worth a Video Frame: Rethinking Convolutional Features for Visible-Infrared Person Re-identificationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361737520:2(1-20)Online publication date: 18-Oct-2023
  • (2023)Collocated Clothing Synthesis with GANs Aided by Textual Information: A Multi-Modal FrameworkACM Transactions on Multimedia Computing, Communications, and Applications10.1145/361409720:1(1-25)Online publication date: 18-Sep-2023
  • (2023)SplitZNS: Towards an Efficient LSM-Tree on Zoned Namespace SSDsACM Transactions on Architecture and Code Optimization10.1145/360847620:3(1-26)Online publication date: 7-Aug-2023
  • (2023)LaDy: Enabling Locality-aware Deduplication Technology on Shingled Magnetic Recording DrivesACM Transactions on Embedded Computing Systems10.1145/360792122:5s(1-25)Online publication date: 9-Sep-2023
  • (2023)Attentional Composition Networks for Long-Tailed Human Action RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/360325320:1(1-18)Online publication date: 9-Jun-2023
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Full Text

View this article in Full Text.

Full Text

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media