Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The log-structured merge-tree (LSM-tree)

Published: 01 June 1996 Publication History

Abstract

High-performance transaction system applications typically insert rows in a History table to provide an activity trace; at the same time the transaction system generates log records for purposes of system recovery. Both types of generated information can benefit from efficient indexing. An example in a well-known setting is the TPC-A benchmark application, modified to support efficient queries on the history for account activity for specific accounts. This requires an index by account-id on the fast-growing History table. Unfortunately, standard disk-based index structures such as the B-tree will effectively double the I/O cost of the transaction to maintain an index such as this in real time, increasing the total system cost up to fifty percent. Clearly a method for maintaining a real-time index at low cost is desirable. The log-structured mergetree (LSM-tree) is a disk-based data structure designed to provide low-cost indexing for a file experiencing a high rate of record inserts (and deletes) over an extended period. The LSM-tree uses an algorithm that defers and batches index changes, cascading the changes from a memory-based component through one or more disk components in an efficient manner reminiscent of merge sort. During this process all index values are continuously accessible to retrievals (aside from very short locking periods), either through the memory component or one of the disk components. The algorithm has greatly reduced disk arm movements compared to a traditional access methods such as B-trees, and will improve cost-performance in domains where disk arm costs for inserts with traditional access methods overwhelm storage media costs. The LSM-tree approach also generalizes to operations other than insert and delete. However, indexed finds requiring immediate response will lose I/O efficiency in some cases, so the LSM-tree is most useful in applications where index inserts are more common than finds that retrieve the entries. This seems to be a common property for history tables and log files, for example. The conclusions of Sect. 6 compare the hybrid use of memory and disk components in the LSM-tree access method with the commonly understood advantage of the hybrid method to buffer disk pages in memory.

References

[1]
Aho, A. V., Hopcroft, J. E., Ullman, J. D.: The design and analysis of computer algorithms. Reading, MA, Addison-Wesley
[2]
Stonebraker M. et al. Readings in database systems 1988 2nd. edn. San Mateo, CA Morgan Kaufmann 442-454
[3]
Bayer R. and Schkolnick M. Stonebraker M. Concurrency of operations on B-trees Readings in database systems 1988 San Mateo, CA Morgan Kaufmann 129-139
[4]
Bernstein P. A., Hadzilacos V., and Goodman N. Concurrency control and recovery in database systems 1987 Reading, MA Addison-Wesley
[5]
Corner D. The ubiquitous B-tree Comput. Surv. 1979 11 121-137
[6]
Copeland, G., Keller, T., Smith, M.: Database buffer and disk configuring and the battle of the bottlenecks. Proc. 4th International Workshop High Performance Transaction Systems, September 1991
[7]
Dadam, P., Lum, V., Praedel, U., Shlageter, G.: Selective deferred index maintenance & concurrency control in integrated information systems. Proc. 11th International VLDB Conference, pp. 142–150, August 1985
[8]
Daniels, D. S., Spector, A. Z., Thompson, D. S.: Distributed logging for transaction processing. ACM SIGMOD Transactions pp. 82–96, (1987)
[9]
Fagin R., Nievergelt J., Pippenger N., and Strong H. R. Extendible hashing — a fast access method for dynamic files ACM Trans. Database Systems 1979 4 N3 315-344
[10]
Garcia-Molina, H., Salem, K.: Sagas. ACM SIGMOD Transactions, pp. 249–259 (1987)
[11]
Garcia-Molina, H., Gawlick, D., Klein, J., Kleissner, K., Salem, K.: Coordinating multitransactional activities. Princeton University Report, CS-TR-247-90, February 1990.
[12]
Garcia-Molina H. Modelling long-running activities as nested sagas IEEE Data Engineering 1991 14 1 14-18
[13]
Gray, J., Putzolu, F.: The five minute rule for trading memory for disk accessess and the 10 Byte rule for trading memory for CPU time. Proc. 1987 ACM SIGMOD Conference, pp. 395–398
[14]
Gray J. and Reuter A. transaction processing, concepts and techniques 1992 San Mateo, CA Morgan Kaufmann
[15]
Kolovson, C. P., Stonebraker, M.: Indexing techniques for historical databases. Proc. 1989 IEEE Data Engineering Conference, pp. 138–147
[16]
Lomet, D., Salzberg, B.: Access methods for multiversion data. Proc. 1989 ACM SIGMOD Conference, pp. 315–323
[17]
Lomet, D., Salzberg, B.: The performance of a multiversion access method. Proc. 1990 ACM SIGMOD Conference, pp. 353–363.
[18]
Lomet D. B. A simple bounded disorder file organization with good performance ACM Trans. on Database Systems 1988 13 4 525-551
[19]
O’Neil P. E. The escrow transactional method TODS 1986 11 4 405-430
[20]
O’Neil, P., Cheng, E., Gawlick, D., O’Neil, E.: The log-structured merge-tree (LSM-tree). UMass/Boston Math & CS Dept Technical Report, 91–6, November, 1991
[21]
O’Neil P. E. The SB-tree: An index-sequential structure for high-performance sequential acess Acta Inf. 1992 29 241-265
[22]
O’Neil, P., Weikum, G.: A log-structured history data access method (LHAM). Presented at the Fifth International Workshop on High-Performance Transaction Systems, September 1993
[23]
Rosemblum M. and Ousterhout J. K. The design and implementation of a log structured file system ACM Trans. Comp. Sys. 1992 10 1 26-52
[24]
Reuter, A.: Contracts: A means for controlling system activities beyond transactional boundaries. Proc. 3rd International Workshop on High Performance Transaction Systems, September 1989
[25]
Severance D. G. and Lohman G. M. Differential files: their application to the maintenance of large databases ACM Trans. Database Systems 1976 1 3 256-267
[26]
Transaction Processing Performance Council (TPC) TPC BENCHMARK A standard specification The performance handbook: for database and transaction processing systems 1993 2nd edn. San Mateo, CA Morgan Kauffman
[27]
Wächter, H.: Contracts: A means for improving reliability in distributed computing. IEEE Spring CompCon 91
[28]
Weikum G. Principles and realization strategies for multilevel transaction management ACM Trans. Database Systems 1991 16 1 132-180
[29]
Wodnicki, J. M., Kurtz, S. C.: GPD performance evaluation lab database 2 Version 2 Utility analysis, IBM Document Number GG09-1031-0, September 28, 1989

Cited By

View all
  • (2025)Holographic Storage for the Cloud: advances and challengesACM Transactions on Storage10.1145/370899321:1(1-31)Online publication date: 8-Jan-2025
  • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/3707641Online publication date: 6-Jan-2025
  • (2025)RIOKV: reducing iterator overhead for efficient short-range query in LSM-tree-based key-value storesThe Journal of Supercomputing10.1007/s11227-024-06735-081:1Online publication date: 1-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Acta Informatica
Acta Informatica  Volume 33, Issue 4
Jun 1996
110 pages

Publisher

Springer-Verlag

Berlin, Heidelberg

Publication History

Published: 01 June 1996
Received: 11 April 1995

Author Tags

  1. Leaf Node
  2. Leaf Level
  3. Memory Buffer
  4. Access Rate
  5. Disk Component

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 23 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Holographic Storage for the Cloud: advances and challengesACM Transactions on Storage10.1145/370899321:1(1-31)Online publication date: 8-Jan-2025
  • (2025)HLN-Tree: A memory-efficient B+-Tree with huge leaf nodes and locality predictorsACM Transactions on Storage10.1145/3707641Online publication date: 6-Jan-2025
  • (2025)RIOKV: reducing iterator overhead for efficient short-range query in LSM-tree-based key-value storesThe Journal of Supercomputing10.1007/s11227-024-06735-081:1Online publication date: 1-Jan-2025
  • (2025)PMCKV: pipeline-based multi-compactions KV stores to improve the system performanceThe Journal of Supercomputing10.1007/s11227-024-06680-y81:1Online publication date: 1-Jan-2025
  • (2025)CDNRocks: computable data nodes with RocksDB to improve the read performance of LSM-tree-based distributed key-value storage systemsThe Journal of Supercomputing10.1007/s11227-024-06526-781:1Online publication date: 1-Jan-2025
  • (2025)Tee-based key-value stores: a surveyThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00877-634:1Online publication date: 1-Jan-2025
  • (2025)An update-intensive LSM-based R-tree indexThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-024-00876-734:1Online publication date: 1-Jan-2025
  • (2024)SlimArchiveProceedings of the 2024 USENIX Conference on Usenix Annual Technical Conference10.5555/3691992.3692068(1257-1272)Online publication date: 10-Jul-2024
  • (2024)ELECTProceedings of the 22nd USENIX Conference on File and Storage Technologies10.5555/3650697.3650715(293-310)Online publication date: 27-Feb-2024
  • (2024)Petabyte-Scale Row-Level Operations in Data LakehousesProceedings of the VLDB Endowment10.14778/3685800.368583417:12(4159-4172)Online publication date: 8-Nov-2024
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media