Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/2592798.2592804acmconferencesArticle/Chapter ViewAbstractPublication PageseurosysConference Proceedingsconference-collections
research-article

An efficient design and implementation of LSM-tree based key-value store on open-channel SSD

Published: 14 April 2014 Publication History

Abstract

Various key-value (KV) stores are widely employed for data management to support Internet services as they offer higher efficiency, scalability, and availability than relational database systems. The log-structured merge tree (LSM-tree) based KV stores have attracted growing attention because they can eliminate random writes and maintain acceptable read performance. Recently, as the price per unit capacity of NAND flash decreases, solid state disks (SSDs) have been extensively adopted in enterprise-scale data centers to provide high I/O bandwidth and low access latency. However, it is inefficient to naively combine LSM-tree-based KV stores with SSDs, as the high parallelism enabled within the SSD cannot be fully exploited. Current LSM-tree-based KV stores are designed without assuming SSD's multi-channel architecture.
To address this inadequacy, we propose LOCS, a system equipped with a customized SSD design, which exposes its internal flash channels to applications, to work with the LSM-tree-based KV store, specifically LevelDB in this work. We extend LevelDB to explicitly leverage the multiple channels of an SSD to exploit its abundant parallelism. In addition, we optimize scheduling and dispatching polices for concurrent I/O requests to further improve the efficiency of data access. Compared with the scenario where a stock LevelDB runs on a conventional SSD, the throughput of storage system can be improved by more than 4X after applying all proposed optimization techniques.

References

[1]
Apache CouchDB. http://couchdb.apache.org/.
[2]
HyperLevelDB. http://hyperdex.org/performance/leveldb/.
[3]
NVM Express explained. http://nvmexpress.org/wp-content/uploads/2013/04/NVM_whitepaper.pdf.
[4]
Redis. http://redis.io/.
[5]
Riak. http://basho.com/leveldb-in-riak-1-2/.
[6]
Tair. http://code.taobao.org/p/tair/src/.
[7]
TokuDB: MySQL performance, MariaDB performance. http://www.tokutek.com/products/tokudb-for-mysql/,.
[8]
Tokyo Cabinet: A modern implementation of DBM. http://fallabs.com/tokyocabinet/,.
[9]
Apache HBase. http://hbase.apache.org/.
[10]
LevelDB -- a fast and lightweight key/value database library by Google. http://code.google.com/p/leveldb/.
[11]
A. Awasthi, A. Nandini, A. Bhattacharya, and P. Sehgal. Hybrid HBase: Leveraging flash SSDs to improve cost per throughput of HBase. In Proceedings of the 18th International Conference on Management of Data (COMAD), pages 68--79, 2012.
[12]
M. A. Bender, M. Farach-Colton, J. T. Fineman, Y. R. Fogel, B. C. Kuszmaul, and J. Nelson. Cache-oblivious streaming B-trees. In Proceedings of the 19th ACM Symposium on Parallel Algorithms and Architectures, SPAA '07, pages 81--92, 2007.
[13]
M. Bjorling, J. Axboe, D. Nellans, and P. Bonnet. Linux block IO: Introducing multi-queue SSD access on multi-core systems. In Proceedings of the 6th International Systems and Storage Conference, SYSTOR '13, pages 22:1--22:10, 2013.
[14]
B. H. Bloom. Space/time trade-offs in hash coding with allowable errors. Commun. ACM, 13(7):422--426, July 1970.
[15]
R. Cattell. Scalable SQL and NoSQL data stores. SIGMOD Rec., 39(4):12--27, May 2011.
[16]
A. M. Caulfield, A. De, J. Coburn, T. I. Mollow, R. K. Gupta, and S. Swanson. Moneta: A high-performance storage array architecture for next-generation, non-volatile memories. In Proceedings of the 43rd Annual IEEE/ACM International Symposium on Microarchitecture, MICRO '43, pages 385--395, 2010.
[17]
A. M. Caulfield, T. I. Mollov, L. A. Eisner, A. De, J. Coburn, and S. Swanson. Providing safe, user space access to fast, solid state disks. In Proceedings of the 17th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVII, pages 387--400, 2012.
[18]
F. Chang, J. Dean, S. Ghemawat, W. C. Hsieh, D. A. Wallach, M. Burrows, T. Chandra, A. Fikes, and R. E. Gruber. Bigtable: A distributed storage system for structured data. ACM Trans. Comput. Syst., 26(2):4:1--4:26, June 2008.
[19]
F. Chen, R. Lee, and X. Zhang. Essential roles of exploiting internal parallelism of flash memory based solid state drives in high-speed data processing. In High Performance Computer Architecture (HPCA), 2011 IEEE 17th International Symposium on, pages 266--277, 2011.
[20]
B. Debnath, S. Sengupta, and J. Li. FlashStore: High throughput persistent key-value store. Proc. VLDB Endow., 3(1-2): 1414--1425, Sept. 2010.
[21]
B. Debnath, S. Sengupta, and J. Li. SkimpyStash: RAM space skimpy key-value store on flash-based storage. In Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data, SIGMOD '11, pages 25--36, 2011.
[22]
G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall, and W. Vogels. Dynamo: Amazon's highly available key-value store. In Proceedings of the 21st ACM Symposium on Operating Systems Principles, SOSP '07, pages 205--220, 2007.
[23]
R. Escriva, B. Wong, and E. G. Sirer. HyperDex: A distributed, searchable key-value store. SIGCOMM Comput. Commun. Rev., 42(4):25--36, Aug. 2012.
[24]
J. Esmet, M. A. Bender, M. Farach-Colton, and B. C. Kuszmaul. The TokuFS streaming file system. In Proceedings of the 4th USENIX Workshop on Hot Topics in Storage and File Systems, HotStorage '12, pages 14:1--14:5, 2012.
[25]
S. Hahn, S. Lee, and J. Kim. SOS: Software-based out-of-order scheduling for high-performance NAND flash-based SSDs. In Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on, pages 13:1--13:5, 2013.
[26]
Y. Hu, H. Jiang, D. Feng, L. Tian, H. Luo, and C. Ren. Exploring and exploiting the multilevel parallelism inside SSDs for improved performance and endurance. Computers, IEEE Transactions on, 62(6):1141--1155, 2013.
[27]
W. K. Josephson, L. A. Bongo, D. Flynn, and K. Li. DFS: A file system for virtualized flash storage. In Proceedings of the 8th USENIX Conference on File and Storage Technologies, FAST '10, pages 85--100, 2010.
[28]
A. Lakshman and P. Malik. Cassandra: A decentralized structured storage system. SIGOPS Oper. Syst. Rev., 44(2):35--40, Apr. 2010.
[29]
N. Leavitt. Will NoSQL databases live up to their promise? Computer, 43(2):12--14, 2010.
[30]
H. Lim, B. Fan, D. G. Andersen, and M. Kaminsky. SILT: A memory-efficient, high-performance key-value store. In Proceedings of the 23rd ACM Symposium on Operating Systems Principles, SOSP '11, pages 1--13, 2011.
[31]
C. Min, K. Kim, H. Cho, S.-W. Lee, and Y. I. Eom. SFS: Random write considered harmful in solid state drives. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, FAST '12, pages 139--154, 2012.
[32]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Inf., 33(4):351--385, June 1996.
[33]
J. Ouyang, S. Lin, S. Jiang, Z. Hou, Y. Wang, and Y. Wang. SDF: Software-defined flash for web-scale internet storage systems. In Proceedings of the 19th International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '14, pages 471--484, 2014.
[34]
K. Ren and G. Gibson. TABLEFS: Enhancing metadata efficiency in the local file system. In Proceedings of the 2013 USENIX Annual Technical Conference, USENIX ATC '13, pages 145--156, 2013.
[35]
M. Rosenblum and J. K. Ousterhout. The design and implementation of a log-structured file system. ACM Trans. Comput. Syst., 10(1):26--52, Feb. 1992.
[36]
P. Shetty, R. Spillane, R. Malpani, B. Andrews, J. Seyster, and E. Zadok. Building workload-independent storage with VT-trees. In Proccedings of the 11th Conference on File and Storage Technologies, FAST '13, pages 17--30, 2013.
[37]
M. Stonebraker. SQL databases v. NoSQL databases. Commun. ACM, 53(4):10--11, Apr. 2010.
[38]
H. Wang, P. Huang, S. He, K. Zhou, C. Li, and X. He. A novel I/O scheduler for SSD with improved performance and lifetime. In Mass Storage Systems and Technologies (MSST), 2013 IEEE 29th Symposium on, pages 6:1--6:5, 2013.

Cited By

View all
  • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
  • (2024)Adaptive Quotient FiltersProceedings of the ACM on Management of Data10.1145/36771282:4(1-28)Online publication date: 30-Sep-2024
  • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
  • Show More Cited By

Index Terms

  1. An efficient design and implementation of LSM-tree based key-value store on open-channel SSD

      Recommendations

      Comments

      Information & Contributors

      Information

      Published In

      cover image ACM Conferences
      EuroSys '14: Proceedings of the Ninth European Conference on Computer Systems
      April 2014
      388 pages
      ISBN:9781450327046
      DOI:10.1145/2592798
      Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

      Sponsors

      Publisher

      Association for Computing Machinery

      New York, NY, United States

      Publication History

      Published: 14 April 2014

      Permissions

      Request permissions for this article.

      Check for updates

      Author Tags

      1. flash
      2. key-value store
      3. log-structured merge tree
      4. solid state disk

      Qualifiers

      • Research-article

      Funding Sources

      Conference

      EuroSys 2014
      Sponsor:
      EuroSys 2014: Ninth Eurosys Conference 2014
      April 14 - 16, 2014
      Amsterdam, The Netherlands

      Acceptance Rates

      EuroSys '14 Paper Acceptance Rate 27 of 147 submissions, 18%;
      Overall Acceptance Rate 241 of 1,308 submissions, 18%

      Upcoming Conference

      EuroSys '25
      Twentieth European Conference on Computer Systems
      March 30 - April 3, 2025
      Rotterdam , Netherlands

      Contributors

      Other Metrics

      Bibliometrics & Citations

      Bibliometrics

      Article Metrics

      • Downloads (Last 12 months)133
      • Downloads (Last 6 weeks)13
      Reflects downloads up to 03 Oct 2024

      Other Metrics

      Citations

      Cited By

      View all
      • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
      • (2024)Adaptive Quotient FiltersProceedings of the ACM on Management of Data10.1145/36771282:4(1-28)Online publication date: 30-Sep-2024
      • (2024)Structural Designs Meet Optimality: Exploring Optimized LSM-tree Structures in a Colossal Configuration SpaceProceedings of the ACM on Management of Data10.1145/36549782:3(1-26)Online publication date: 30-May-2024
      • (2024)Optimizing Time Series Queries with VersionsProceedings of the ACM on Management of Data10.1145/36549622:3(1-27)Online publication date: 30-May-2024
      • (2024)eZNS: Elastic Zoned Namespace for Enhanced Performance Isolation and Device UtilizationACM Transactions on Storage10.1145/365371620:3(1-41)Online publication date: 12-Apr-2024
      • (2024)zns-tools: An eBPF-powered, Cross-Layer Storage Profiling Tool for NVMe ZNS SSDsProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652205(23-32)Online publication date: 22-Apr-2024
      • (2024)ZWAL: Rethinking Write-ahead Logs for ZNS SSDs with Zone AppendsProceedings of the 4th Workshop on Challenges and Opportunities of Efficient and Performant Storage Systems10.1145/3642963.3652203(9-16)Online publication date: 22-Apr-2024
      • (2024)gLSM: Using GPGPU to Accelerate Compactions in LSM-tree-based Key-value StoresACM Transactions on Storage10.1145/363378220:1(1-41)Online publication date: 30-Jan-2024
      • (2024)An LSM Tree Augmented with B+ Tree on Nonvolatile MemoryACM Transactions on Storage10.1145/363347520:1(1-24)Online publication date: 30-Jan-2024
      • (2024)Beyond Bloom: A Tutorial on Future Feature-Rich FiltersCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654681(636-644)Online publication date: 9-Jun-2024
      • Show More Cited By

      View Options

      Get Access

      Login options

      View options

      PDF

      View or Download as a PDF file.

      PDF

      eReader

      View online with eReader.

      eReader

      Media

      Figures

      Other

      Tables

      Share

      Share

      Share this Publication link

      Share on social media