Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

SEALDB: An Efficient LSM-tree Based KV Store on SMR Drives with Sets and Dynamic Bands

Published: 01 November 2019 Publication History

Abstract

Key-value (KV) stores play an increasingly critical role in supporting diverse large-scale applications in modern data centers hosting terabytes of KV items which even might reside on a single server due to virtualization purposes. The combination of the ever-growing volume of KV items and storage/application consolidation is driving a trend of high storage density for KV stores. Shingled Magnetic Recording (SMR) represents a promising technology for increasing disk capacity, which however comes with the increased complexity of handling random writes. To take the best advantages of SMR drives, applications are expected to work in an SMR-friendly way. In this work, we present SEALDB, a Log-Structured Merge tree (LSM-tree) based key-value store that is specifically optimized for SMR drives via avoiding random writes and the corresponding write amplification on SMR drives. First, for LSM-trees, SEALDB collects and groups participating data of each compaction into sets. Using a set as the basic unit for compactions, SEALDB improves compaction efficiency by reducing random I/Os. Second, SEALDB creates variable sized bands on original HM-SMR drives, named dynamic bands. Dynamic bands store sets in an SMR-friendly way to eliminate the auxiliary write amplification from SMR drives. Third, SEALDB employs two light-weight garbage collection (GC) policies to further improve the space efficiency. We demonstrate the advantages of SEALDB via extensive experiments with various workloads. Overall, SEALDB delivers impressive performance compared with LevelDB, e.g., $3.42\times$3.42×/$2.65\times$2.65× faster for random writes (without or with GCs), and $3.96\times$3.96× faster for sequential reads.

References

[1]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber, “Bigtable: A distributed storage system for structured data,” ACM Trans. Comput. Syst., vol. 26, no. 2, 2008, Art. no.
[2]
B. Fitzpatrick and A. Vorobey, “Memcached: A distributed memory object caching system,” 2011, [Online]. Available: https://memcached.org
[3]
A. Abutalib and D. Peter, “Skylight—A window on shingled disk operation,” in Proc. 13th USENIX Conf. File Storage Technol., 2015, pp. 135–149.
[4]
G. Garth and G. Greg, “Principles of operation for shingled disk devices,” Carnegie Mellon Univ., Pittsburgh, PA 15213, USA, Parallel Data Lab Tech. Rep. CMU-PDL-11–107, 2011.
[6]
HGST, “Western digitals worlds first 14tb enterprise hard disk drives,” 2017. [Online]. Available: https://www.hgst.com/products/hard-drives/ultrastar-hs14
[7]
F. Tim and G. Garth, “Shingled magnetic recording: Areal density increase requires new data management,” USENIX, vol. 38, no. 3, 2013.
[8]
B. Eric, Y. Lawrence, G. Lawrence, C. Robert, and T. Theodore, “Disks for data centers,” 2016.
[9]
A. Amer, D. D. Long, E. L. Miller, J.-F. Paris, and S. T. Schwarz, “Design issues for a shingled write disk system,” in Proc. IEEE 26th Symp. Mass Storage Syst. Technol., 2010, pp. 1–12.
[10]
A. Aghayev, T. Tso, G. Gibson, and P. Desnoyers, “Evolving ext4 for shingled disks,” in Proc. 15th USENIX Conf. File Storage Technol., 2017, pp. 105–119.
[11]
L. Lu, T. S. Pillai, A. C. Arpaci-Dusseau, and R. H. Arpaci-Dusseau, “Wisckey: Separating keys from values in ssd-conscious storage,” in Proc. 14th USENIX Conference on File Storage Technol., 2016, pp. 133–148.
[12]
Y. Cassuto, M. A. Sanvido, C. Guyot, D. R. Hall, and Z. Z. Bandic, “Indirection systems for shingled-recording disk drives,” in Proc. IEEE 26th Symp. Mass Storage Syst. Technol., 2010.
[13]
W. He and D. H. Du, “Smart: An approach to shingled magnetic recording translation,” in Proc. 15th USENIX Conf. File Storage Technol., 2017, pp. 1–14.
[14]
S. N. Jones, A. Amer, E. L. Miller, D. D. Long, R. Pitchumani, and C. R. Strong, “Classifying data to reduce long-term data movement in shingled write disks,” ACM Trans. Storage, vol. 12, no. 1, 2016, Art. no.
[15]
Seagate, “Smrffs-ext4.” (2015). [Online]. Available: https://github.com/Seagate/SMR_FS-EXT4
[16]
C. Jin, W. Xi, Z. Ching, F. Huo, and C. Lim, “Hismrfs: A high performance file system for shingled storage array,” in Proc. 30th Symp. Mass Storage Syst. Technol., 2014, pp. 1–6.
[17]
M. Adam, W. Noah, G. Cyril, L. Damien, M. Carlos, and B. Zvonimr, “Zea, a data management approach for smr,” in Proc. 8th USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage 16), 2016.
[18]
G. Sanjay and D. Jeff, “Leveldb,” 2016. [Online]. Available: https://github.com/Level/leveldown/issues/298
[19]
G. A. Timothy, P. Vamsi, B. Dhruba, and C. Mark, “Linkbench: A database benchmark based on the facebook social graph,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2013, pp. 1185–1196.
[20]
S. Roshan, K. Jay, G. Lei, F. Alex, S. Chinmay, and S. Sam, “Serving large-scale batch computed data with project voldemort,” in Proc. 10th USENIX Conf. File Storage Technol., 2012, p. 18.
[21]
Facebook, “Rocksdb, a persistent key-value store for fast storage enviroments.” (2019). [Online]. Available: http://rocksdb.org/
[22]
B. Doug, K. Sanjeev, C. L. Harry, S. Jason, and V. Peter, “Finding a needle in haystack: Facebooks photo storage,” in Proc. 9th Operating Syst. Des. Implementation, vol. 10, no. 2010, pp. 1–8, 2015.
[23]
X. Wu, Y. Xu, Z. Shao, and S. Jiang, “Lsm-trie: An lsm-tree-based ultra-large key-value store for small data,” in Proc. Usenix ATC, 2015, pp. 71–82.
[24]
L. Avinash and M. Prashant, “Cassandra: A decentralized structured storage system,” in Proc. 3rd ACM SIGOPS Operating Syst. Rev., vol. 44, no. 2, pp. 35–40, 2009.
[25]
H. David, H. M. John, and D. C. Jonathan, “Data handling algorithms for autonomous shingled magnetic recording hdds,” IEEE Trans. Magn., vol. 48, no. 5, pp. 1777–1781, May 2012.
[26]
A. Amer, J. Holliday, D. D. Long, E. L. Miller, J.-F. Pâris, and T. Schwarz, “Data management and layout for shingled magnetic recording,” IEEE Trans. Magn., vol. 47, no. 10, pp. 3691–3697, Oct. 2011.
[27]
W. He and D. H. Du, “Novel address mappings for shingled write disks,” in Proc. 6th USENIX Conf. Hot Topics Storage File Syst., 2014, p. 5.
[28]
A. Suresh, G. Gibson, and G. Ganger, “Shingled magnetic recording for big data applications,” Carnegie Mellon University, Parallel Data Lab, Tech. Rep. CMU-PDL-12–105, 2012.
[29]
K. Saurabh, P. Swapnil, and A. G. Garth, “Caveat-scriptor: Write anywhere shingled disks,” in Proc. 7th USENIX Conf. Hot Top. Storage File Syst., 2015, p. 16.
[30]
O. Patrick, C. Edward, G. Dieter, and O. Elizabeth, “The log-structured merge-tree (lsm-tree),” Acta Informatica, vol. 33, no. 4, pp. 351–385, 1996.
[31]
Y. Yue, B. He, Y. Li, and W. Wang, “Building an efficient put-intensive key-value store with skip-tree,” IEEE Trans. Parallel Distrib. Syst., vol. 28, no. 4, pp. 961–973, Apr. 2017.
[32]
M. Shetty, “Kinetic open storage”. (2013). [Online]. Available: https://www.openkinetic.org/
[33]
J. Huang, A. Badam, M. K. Qureshi, and K. Schwan, “Unified address translation for memory-mapped ssds with flashmap,” in Proc. ACM/IEEE 42nd Annu. Int. Symp. Comput. Archit., 2015, pp. 580–591.
[34]
S. Lee, M. Liu, S. Jun, S. Xu, J. Kim, et al., “Application-managed flash,” in Proc. 14th USENIX Conference on File Storage Technol., 2016, pp. 339–353.
[35]
A. K. KV, M. Cao, J. R. Santos, and A. Dilger, “Ext4 block and inode allocator improvements,” in Proc. Linux Symp., 2008, vol. 1.
[36]
P. Rekha, H. James, and L. Ethan, “Smrdb: Key-value data store for shingled magnetic recording disks,” in Proc. 8th ACM Int. Syst. Storage Conf., 2015, Art. no.
[37]
T. Yao, J. Wan, P. Huang, X. He, Q. Gui, F. Wu, and C. Xie, “A light-weight compaction tree to reduce i/o amplification toward efficient key-value stores,” in Proc. 33rd Int. Conf. Massive Storage Syst. Technol. (MSST 2017), 2017.
[38]
T. Yao, J. Wan, P. Huang, Y. Zhang, Z. Liu, C. Xie, and X. He, “GearDB: A gc-free key-value store on hm-smr drives with gear compaction,” in Proc. 17th USENIX Conf. File Storage Technol., 2019, pp. 159–171.
[39]
K. Ren, Q. Zheng, J. Arulraj, and G. Gibson, “Slimdb: A space-efficient key-value storage engine for semi-sorted data,” Proc. VLDB Endowment, vol. 10, no. 13, pp. 2037–2048, 2017.
[40]
S. Russell and R. Raghu, “blsm: A general purpose log structured merge tree,” in Proc. ACM SIGMOD Int. Conf. Manage. Data, 2012, pp. 217–228.
[41]
P. Raju, R. Kadekodi, V. Chidambaram, and I. Abraham, “Pebblesdb: Building key-value stores using fragmented log-structured merge trees,” in Proc. 26th Symp. Operating Syst. Principles, 2017, pp. 497–514.
[42]
P. J. Shetty, R. P. Spillane, R. R. Malpani, B. Andrews, J. Seyster, and E. Zadok, “Building workload-independent storage with vt-trees,” in Proc. 11th USENIX Conf. File Storage Technol., 2013, pp. 17–30.
[43]
K. Li and K. H. Cheng, “A two dimensional buddy system for dynamic resource allocation in a partitionable mesh connected system,” in Proc. ACM Annu. Conf. Cooperation, 1990, pp. 22–27.
[44]
F. Wu, M.-C. Yang, Z. Fan, B. Zhang, X. Ge, and D. H. Du, “Evaluating host aware smr drives,” in Proc. 8th UNISEX Workshop Hot Top. Storage file Syst., 2016.
[45]
HGST, “Libzbc version 5.4.1,” 2017. [Online]. Available: https://github.com/hgst/libzbc
[46]
F. C. Brian, S. Adam, T. Erwin, R. Raghu, and S. Russell, “Benchmarking cloud serving systems with ycsb,” in Proc. 1st ACM Symp. Cloud Comput., 2010, pp. 143–154.
[47]
H. Lim and J. Kim, “Leveldb-raw: Eliminating file system overhead for optimizing performance of leveldb engine,” in Proc. 19th Int. Conf. Advanced Commun. Technol., 2017, pp. 777–781.
[48]
T. Yao, Z. Tan, J. Wan, P. Huang, Y. Zhang, C. Xie, and X. He, “A set-aware key-value store on shingled magnetic recording drives with dynamic band,” in Proc. IEEE Int. Parallel Distrib. Process. Symp., 2018, pp. 306–315.

Cited By

View all
  • (2024)A Space-Grained Cleaning Method to Reduce Long-Tail Latency of DM-SMR DisksACM Transactions on Embedded Computing Systems10.1145/364382723:2(1-24)Online publication date: 5-Feb-2024
  • (2023)WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDsACM Transactions on Architecture and Code Optimization10.1145/363748821:1(1-23)Online publication date: 13-Dec-2023
  • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Parallel and Distributed Systems
IEEE Transactions on Parallel and Distributed Systems  Volume 30, Issue 11
Nov. 2019
240 pages

Publisher

IEEE Press

Publication History

Published: 01 November 2019

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Space-Grained Cleaning Method to Reduce Long-Tail Latency of DM-SMR DisksACM Transactions on Embedded Computing Systems10.1145/364382723:2(1-24)Online publication date: 5-Feb-2024
  • (2023)WA-Zone: Wear-Aware Zone Management Optimization for LSM-Tree on ZNS SSDsACM Transactions on Architecture and Code Optimization10.1145/363748821:1(1-23)Online publication date: 13-Dec-2023
  • (2023)MirrorKV: An Efficient Key-Value Store on Hybrid Cloud Storage with Balanced Performance of Compaction and QueryingProceedings of the ACM on Management of Data10.1145/36267361:4(1-27)Online publication date: 12-Dec-2023
  • (2023)LaDy: Enabling Locality-aware Deduplication Technology on Shingled Magnetic Recording DrivesACM Transactions on Embedded Computing Systems10.1145/360792122:5s(1-25)Online publication date: 31-Oct-2023
  • (2023)Meta-Block: Exploiting Cross-Layer and Direct Storage Access for Decentralized Blockchain Storage SystemsIEEE Transactions on Computers10.1109/TC.2022.322630572:7(2052-2064)Online publication date: 1-Jul-2023
  • (2021)SLA: A Cache Algorithm for SSD-SMR Storage System with Minimum RMWsAlgorithms and Architectures for Parallel Processing10.1007/978-3-030-95391-1_37(587-601)Online publication date: 3-Dec-2021

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media