Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Public Access

LSM-based storage techniques: a survey

Published: 19 July 2019 Publication History

Abstract

Recently, the log-structured merge-tree (LSM-tree) has been widely adopted for use in the storage layer of modern NoSQL systems. Because of this, there have been a large number of research efforts, from both the database community and the operating systems community, that try to improve various aspects of LSM-trees. In this paper, we provide a survey of recent research efforts on LSM-trees so that readers can learn the state of the art in LSM-based storage techniques. We provide a general taxonomy to classify the literature of LSM-trees, survey the efforts in detail, and discuss their strengths and trade-offs. We further survey several representative LSM-based open-source NoSQL systems and discuss some potential future research directions resulting from the survey.

References

[1]
Absalyamov, I., et al.: Lightweight cardinality estimation in LSM-based systems. In: ACM SIGMOD, pp. 841–855 (2018)
[2]
Ahmad MY and Kemme B Compaction management in distributed key-value datastores PVLDB 2015 8 8 850-861
[3]
Alsubaiee S et al. AsterixDB: a scalable, open source BDMS PVLDB 2014 7 14 1905-1916
[4]
Alsubaiee S et al. Storage management in AsterixDB PVLDB 2014 7 10 841-852
[5]
Alsubaiee, S., et al.: LSM-based storage and indexing: an old idea with timely benefits. In: International ACM SIGMOD Workshop on Managing and Mining Enriched Geo-spatial Data (GeoRich), pp 1–6 (2015)
[6]
Amur, H., et al.: Design of a write-optimized data store. Tech. rep, Georgia Institute of Technology (2013)
[7]
Athanassoulis, M., et al.: MaSM: efficient online updates in data warehouses. In: ACM SIGMOD, pp. 865–876. ACM (2011)
[8]
Athanassoulis, M., et al.: Designing access methods: the RUM conjecture. In: EDBT, vol. 2016, pp. 461–466 (2016)
[9]
Balmau, O., et al.: FloDB: unlocking memory in persistent key-value stores. In: European Conference on Computer Systems (EuroSys), pp. 80–94 (2017)
[10]
Balmau, O., et al.: TRIAD: creating synergies between memory, disk and log in log structured key-value stores. In: USENIX Annual Technical Conference (ATC), pp. 363–375 (2017)
[11]
Bender MA et al. Don’t thrash: how to cache your hash on flash PVLDB 2012 5 11 1627-1637
[12]
Bloom BH Space/time trade-offs in hash coding with allowable errors CACM 1970 13 7 422-426
[13]
Bortnikov E et al. Accordion: better memory organization for LSM key-value stores PVLDB 2018 11 12 1863-1875
[15]
Chan, H.H.W., et al.: HashKV: enabling efficient updates in KV storage via hashing. In: USENIX Annual Technical Conference (ATC), pp. 1007–1019 (2018)
[16]
Chang Fay, Dean Jeffrey, Ghemawat Sanjay, Hsieh Wilson C., Wallach Deborah A., Burrows Mike, Chandra Tushar, Fikes Andrew, and Gruber Robert E. Bigtable ACM Transactions on Computer Systems 2008 26 2 1-26
[17]
Chazelle B and Guibas LJFractional cascading: I. A data structuring techniqueAlgorithmica198611133-162858402
[18]
Chen, G.J., et al.: Realtime data processing at Facebook. In: ACM SIGMOD, pp. 1087–1098 (2016)
[20]
Dayan, N., Idreos, S.: Dostoevsky: Better space-time trade-offs for LSM-tree based key-value stores via adaptive removal of superfluous merging. In: ACM SIGMOD, pp. 505–520 (2018)
[21]
Dayan, N., et al.: Monkey: optimal navigable key-value store. In: ACM SIGMOD, pp. 79–94 (2017)
[22]
Dayan N et al.Optimal Bloom filters and adaptive merging for LSM-treesACM TODS201843416:1-16:483892828
[23]
DeCandia, G., et al.: Dynamo: Amazon’y highly available key-value store. In: ACM SOSP, pp. 205–220 (2007)
[24]
Dong, S., et al.: Optimizing space amplification in RocksDB. In: CIDR, vol. 3, p. 3 (2017)
[25]
D’silva, J.V., et al.: Secondary indexing techniques for key-value stores: two rings to rule them all. In: International Workshop On Design, Optimization, Languages and Analytical Processing of Big Data (DOLAP) (2017)
[26]
Duan, H., et al.: Incremental materialized view maintenance on distributed log-structured merge-tree. In: DASFAA, pp. 682–700 (2018)
[27]
Fagin, R., et al.: Optimal aggregation algorithms for middleware. In: ACM PODS, pp. 102–113 (2001)
[28]
Fan, B., et al.: Cuckoo filter: practically better than bloom. In: International Conference on emerging Networking EXperiments and Technologies (CoNEXT), pp. 75–88 (2014)
[29]
Fang, Y., et al.: Spatial indexing in Microsoft SQL Server 2008. In: ACM SIGMOD, pp. 1207–1216 (2008)
[30]
Golan-Gueta, G., et al.: Scaling concurrent log-structured data stores. In: European Conference on Computer Systems (EuroSys), pp. 32:1–32:14 (2015)
[31]
Guttman Antonin R-trees ACM SIGMOD Record 1984 14 2 47
[33]
Haerder T and Reuter APrinciples of transaction-oriented database recoveryACM CSUR1983154287-317792721
[34]
Jagadish, H.V., et al.: Incremental organization for data recording and warehousing. In: VLDB, pp. 16–25 (1997)
[35]
Jermaine C et al. The partitioned exponential file for database storage management VLDBJ 2007 16 4 417-437
[36]
Kannan, S., et al.: Redesigning LSMs for nonvolatile memory with NoveLSM. In: USENIX Annual Technical Conference (ATC), pp. 993–1005 (2018)
[37]
Khodaei Ali, Shahabi Cyrus, and Li Chen Hybrid Indexing and Seamless Ranking of Spatial and Textual Features of Web Documents Lecture Notes in Computer Science 2010 Berlin, Heidelberg Springer Berlin Heidelberg 450-466
[38]
Kim, T., et al.: Supporting similarity queries in Apache AsterixDB. In: EDBT, pp. 528–539 (2018)
[39]
Kim, Y., et al.: A comparative study of log-structured merge-tree-based spatial indexes for big data. In: ICDE, pp. 147–150 (2017)
[41]
Lawder, J.: The application of space-filling curves to the storage and retrieval of multi-dimensional data. Ph.D. thesis, PhD Thesis, University of London, UK (2000)
[42]
Li Y et al. Tree indexing on solid state drives PVLDB 2010 3 1–2 1195-1206
[43]
Lim, H., et al.: Towards accurate and fast evaluation of multi-stage log-structured designs. In: USENIX Conference on File and Storage Technologies (FAST), pp. 149–166 (2016)
[44]
Lu, L., et al.: WiscKey: separating keys from values in SSD-conscious storage. In: USENIX Conference on File and Storage Technologies (FAST), pp. 133–148 (2016)
[45]
Luo C and Carey MJ Efficient data ingestion and query processing for LSM-based storage systems PVLDB 2019 12 5 531-543
[47]
Mathieu, C., et al.: Bigtable merge compaction. CoRR arXiv:1407.3008 (2014)
[48]
Mei, F., et al.: LSM-tree managed storage for large-scale key-value store. In: ACM SoCC, pp. 142–156 (2017)
[49]
Mei, F., et al.: SifrDB: a unified solution for write-optimized key-value stores in large datacenter. In: ACM SoCC, pp. 477–489 (2018)
[50]
Muth P et al. The LHAM log-structured history data access method VLDBJ 2000 8 3 199-221
[51]
O’Neil P et al. The log-structured merge-tree (LSM-tree) Acta Inf. 1996 33 4 351-385
[52]
Pan FF et al. dCompaction: speeding up compaction of the LSM-tree via delayed compaction J. Comput. Sci. Technol. 2017 32 1 41-54
[53]
Papagiannis, A., et al.: An efficient memory-mapped key-value store for flash storage. In: ACM SoCC, pp. 490–502 (2018)
[54]
Pugh W Skip lists: a probabilistic alternative to balanced trees CACM 1990 33 6 668-676
[55]
Putze F et al.Cache-, hash-, and space-efficient bloom filtersJ. Exp. Algorithmics2010144:4.4-4:4.1827708561284.68218
[56]
Qader, M.A., et al.: A comparative study of secondary indexing techniques in LSM-based NoSQL databases. In: ACM SIGMOD, pp. 551–566 (2018)
[58]
Raju, P., et al.: PebblesDB: building key-value stores using fragmented log-structured merge trees. In: ACM SOSP, pp. 497–514 (2017)
[59]
Ren K et al. SlimDB: a space-efficient key-value storage engine for semi-sorted data PVLDB 2017 10 13 2037-2048
[60]
Rosenblum M and Ousterhout JK The design and implementation of a log-structured file system ACM TOCS 1992 10 1 26-52
[61]
Sears, R., Ramakrishnan, R.: bLSM: a general purpose log structured merge tree. In: ACM SIGMOD, pp. 217–228 (2012)
[62]
Seltzer, M.I.: File system performance and transaction support. Tech. rep., PhD Thesis, Department of Electrical Engineering and Computer Sciences, University of California Berkeley (1992)
[63]
Severance DG and Lohman GM Differential files: their application to the maintenance of large databases ACM TODS 1976 1 3 256-267
[64]
Shetty, P.J., et al.: Building workload-independent storage with VT-trees. In: USENIX Conference on File and Storage Technologies (FAST), pp. 17–30 (2013)
[65]
Stonebraker, M.: The design of the Postgres storage system. In: VLDB, pp. 289–300 (1987)
[66]
Tan, W., et al.: Diff-index: differentiated index in distributed log-structured data stores. In: EDBT, pp. 700–711 (2014)
[67]
Tang, Y., et al.: Deferred lightweight indexing for log-structured key-value stores. In: International Symposium in Cluster, Cloud, and Grid Computing (CCGrid), pp. 11–20 (2015)
[68]
Teng, D., et al.: LSbM-tree: Re-enabling buffer caching in data management for mixed reads and writes. In: IEEE International Conference on Distributed Computing Systems (ICDCS), pp. 68–79 (2017)
[69]
Teng D et al. A low-cost disk solution enabling LSM-tree to achieve high performance for mixed read/write workloads ACM TOS 2018 14 2 15:1-15:26
[70]
Thonangi, R., Yang, J.: On log-structured merge for solid-state drives. In: ICDE, pp. 683–694 (2017)
[71]
Thonangi, R., et al.: A practical concurrent index for solid-state drives. In: ACM CIKM, pp. 1332–1341 (2012)
[72]
Turner J New directions in communications (or which way to the information age?) IEEE Commun. Mag. 1986 24 10 8-15
[73]
Vinçon, T., et al.: NoFTL-KV: Tackling write-amplification on KV-stores with native storage management. In: EDBT, pp. 457–460 (2018)
[74]
Wang, P., et al.: An efficient design and implementation of LSM-tree based key-value store on open-channel SSD. In: European Conference on Computer Systems (EuroSys), pp. 16:1–16:14 (2014)
[75]
Wu, L., et al.: LSII: An indexing structure for exact real-time search on microblogs. In: ICDE, pp. 482–493 (2013)
[76]
Wu, X., et al.: LSM-trie: an LSM-tree-based ultra-large key-value store for small data. In: USENIX Annual Technical Conference (ATC), pp. 71–82 (2015)
[77]
Yao ACC On random 2–3 trees Acta Inf. 1978 9 2 159-170
[78]
Yao T et al. Building efficient key-value stores via a lightweight compaction tree ACM TOS 2017 13 4 29:1-29:28
[79]
Yao, T., et al.: A light-weight compaction tree to reduce I/O amplification toward efficient key-value stores. In: International Conference on Massive Storage Systems and Technology (MSST) (2017)
[80]
Yoon, H., et al.: Mutant: Balancing storage cost and latency in LSM-tree data stores. In: ACM SoCC, pp. 162–173 (2018)
[81]
Yue Y et al.Building an efficient put-intensive key-value store with skip-treeIEEE Trans. Parallel Distrib. Syst.2017284961-9731412262
[82]
Zhang, W., et al.: Improving write performance of LSMT-based key-value store. In: International Conference on Parallel and Distributed Systems (ICPADS), pp. 553–560 (2016)
[83]
Zhang, Y., et al.: ElasticBF: Fine-grained and elastic bloom filter towards efficient read for LSM-tree-based KV stores. In: USENIX Workshop on Hot Topics in Storage and File Systems (HotStorage) (2018)
[84]
Zhang, Z., et al.: Pipelined compaction for the LSM-tree. In: IEEE International Parallel & Distributed Processing Symposium (IPDPS), pp. 777–786 (2014)
[85]
Zhu Yanchao, Zhang Zhao, Cai Peng, Qian Weining, and Zhou Aoying An Efficient Bulk Loading Approach of Secondary Index in Distributed Log-Structured Data Stores Database Systems for Advanced Applications 2017 Cham Springer International Publishing 87-102

Cited By

View all
  • (2025)Randomized Sketches for Quantile in LSM-tree based StoreProceedings of the ACM on Management of Data10.1145/37097173:1(1-26)Online publication date: 11-Feb-2025
  • (2025)Holographic Storage for the Cloud: advances and challengesACM Transactions on Storage10.1145/370899321:1(1-31)Online publication date: 8-Jan-2025
  • (2025)Two-level massive string dictionariesInformation Systems10.1016/j.is.2024.102490128:COnline publication date: 1-Feb-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image The VLDB Journal — The International Journal on Very Large Data Bases
The VLDB Journal — The International Journal on Very Large Data Bases  Volume 29, Issue 1
Jan 2020
586 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 19 July 2019
Accepted: 05 July 2019
Revision received: 17 April 2019
Received: 20 December 2018

Author Tags

  1. LSM-tree
  2. NoSQL
  3. Storage management
  4. Indexing

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 17 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Randomized Sketches for Quantile in LSM-tree based StoreProceedings of the ACM on Management of Data10.1145/37097173:1(1-26)Online publication date: 11-Feb-2025
  • (2025)Holographic Storage for the Cloud: advances and challengesACM Transactions on Storage10.1145/370899321:1(1-31)Online publication date: 8-Jan-2025
  • (2025)Two-level massive string dictionariesInformation Systems10.1016/j.is.2024.102490128:COnline publication date: 1-Feb-2025
  • (2024)KGFabric: A Scalable Knowledge Graph Warehouse for Enterprise Data InterconnectionProceedings of the VLDB Endowment10.14778/3685800.368581017:12(3841-3854)Online publication date: 8-Nov-2024
  • (2024)LavaStore: ByteDance's Purpose-Built, High-Performance, Cost-Effective Local Storage Engine for Cloud ServicesProceedings of the VLDB Endowment10.14778/3685800.368580717:12(3799-3812)Online publication date: 8-Nov-2024
  • (2024)On Reducing Space Amplification with Multi-Column Compaction in Apache IoTDBProceedings of the VLDB Endowment10.14778/3681954.368197717:11(2974-2986)Online publication date: 30-Aug-2024
  • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 1-Apr-2024
  • (2024)SuccinctKV: a CPU-efficient LSM-tree Based KV Store with Scan-based CompactionACM Transactions on Architecture and Code Optimization10.1145/369587321:4(1-26)Online publication date: 20-Nov-2024
  • (2024)Space-efficient FTL for Mobile Storage via Tiny Neural NetsProceedings of the 17th ACM International Systems and Storage Conference10.1145/3688351.3689157(146-161)Online publication date: 16-Sep-2024
  • (2024)PC-LMT: The Point Cloud Log Merge Tree for the Helena Point Cloud DatabaseProceedings of the 12th ACM SIGSPATIAL International Workshop on Analytics for Big Geospatial Data10.1145/3681763.3698476(1-9)Online publication date: 29-Oct-2024
  • Show More Cited By

View Options

View options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media