Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Benchmarking learned indexes

Published: 01 September 2020 Publication History

Abstract

Recent advancements in learned index structures propose replacing existing index structures, like B-Trees, with approximate learned models. In this work, we present a unified benchmark that compares well-tuned implementations of three learned index structures against several state-of-the-art "traditional" baselines. Using four real-world datasets, we demonstrate that learned index structures can indeed outperform non-learned indexes in read-only in-memory workloads over a dense array. We investigate the impact of caching, pipelining, dataset size, and key size. We study the performance profile of learned index structures, and build an explanation for why learned models achieve such good performance. Finally, we investigate other important properties of learned index structures, such as their performance in multi-threaded systems and their build times.

References

[1]
C++ lower_bound, http://cplusplus.com/reference/algorithm/lower_bound/.
[2]
RobinMap, https://github.com/Tessil/robin-map.
[3]
RocksDB, https://rocksdb.org/.
[4]
Searching on sorted data benchmark, https://learned.systems/sosd.
[5]
SIMD Cuckoo Hash, https://github.com/stanford-futuredata/index-baselines.
[6]
STX B+ Tree, https://panthema.net/2007/stx-btree/.
[7]
N. Ao, F. Zhang, D. Wu, D. S. Stones, G. Wang, X. Liu, J. Liu, and S. Lin. Efficient parallel lists intersection and index compression algorithms using graphics processing units. Proceedings of the VLDB Endowment, 4(8):470--481, May 2011.
[8]
J. L. Bentley and A. C.-C. Yao. An almost optimal algorithm for unbounded searching. Information Processing Letters, 5(3):82--87, Aug. 1976.
[9]
R. Binna, E. Zangerle, M. Pichl, G. Specht, and V. Leis. HOT: A height optimized trie index for main-memory database systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 521--534, New York, NY, USA, 2018. Association for Computing Machinery.
[10]
J. Ding, U. F. Minhas, H. Zhang, Y. Li, C. Wang, B. Chandramouli, J. Gehrke, D. Kossmann, and D. Lomet. ALEX: An Updatable Adaptive Learned Index. arXiv:1905.08898 [cs], May 2019.
[11]
P. Ferragina and G. Vinciguerra. Learned data structures. In Recent Trends in Learning From Data, volume 896 of Studies in Computational Intelligence. Springer, 2020.
[12]
P. Ferragina and G. Vinciguerra. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 13(8):1162--1175, Apr. 2020.
[13]
A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 1189--1206, New York, NY, USA, 2019. ACM.
[14]
G. Graefe. B-tree indexes, interpolation search, and skew. In Proceedings of the 2nd International Workshop on Data Management on New Hardware, DaMoN '06, Chicago, Illinois, June 2006. Association for Computing Machinery.
[15]
C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10, 2010.
[16]
A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. SOSD: A Benchmark for Learned Indexes. In ML for Systems at NeurIPS, MLForSystems @ NeurIPS '19, Dec. 2019.
[17]
A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. RadixSpline: A single-pass learned index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM @ SIGMOD '20, pages 1--5, Portland, Oregon, June 2020. Association for Computing Machinery.
[18]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, New York, NY, USA, 2018. ACM.
[19]
V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE '13, pages 38--49, USA, 2013. IEEE Computer Society.
[20]
C. Luo and M. J. Carey. LSM-based storage techniques: A survey. PVLDB, 29(1):393--418, Jan. 2020.
[21]
R. Marcus, E. Zhang, and T. Kraska. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD '20, Portland, OR, June 2020.
[22]
W. M. McKeeman. Peephole optimization. Communications of the ACM, 8(7):443--444, July 1965.
[23]
V. Nathan, J. Ding, M. Alizadeh, and T. Kraska. Learning Multi-dimensional Indexing. In ML for Systems at NeurIPS, MLForSystems @ NeurIPS '19, Dec. 2019.
[24]
T. Neumann and S. Michel. Smooth interpolating histograms with error guarantees. In Sharing Data, Information and Knowledge, 25th British National Conference on Databases, BNCOD '08, pages 126--138, 2008.
[25]
Peter Bailis, Kai Sheng Tai, Pratiksha Thaker, and Matei Zaharia. Don't Throw Out Your Algorithms Book Just Yet: Classical Data Structures That Can Outperform Learned Indexes (blog post), https://dawn.cs.stanford.edu/2018/01/11/index-baselines/, 2018.
[26]
Peter Boncz and Thomas Neumann. The Case for B-Tree Index Structures (blog post), http://databasearchitects.blogspot.com/2017/12/the-case-for-b-tree-index-structures.html, 2017.
[27]
S. Richter, V. Alvarez, and J. Dittrich. A seven-dimensional analysis of hashing methods and its implications on query processing. Proceedings of the VLDB Endowment, 9(3):96--107, Nov. 2015.
[28]
L.-C. Schulz, D. Broneske, and G. Saake. An eight-dimensional systematic evaluation of optimized search algorithms on modern processors. Proceedings of the VLDB Endowment, 11(11):1550--1562, July 2018.
[29]
P. Van Sandt, Y. Chronis, and J. M. Patel. Efficiently Searching In-Memory Sorted Arrays: Revenge of the Interpolation Search? In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 36--53, New York, NY, USA, 2019. ACM.
[30]
X. Wu, F. Ni, and S. Jiang. Wormhole: A Fast Ordered Index for In-memory Data Management. In Proceedings of the Fourteenth EuroSys Conference 2019, EuroSys '19, pages 1--16, Dresden, Germany, Mar. 2019. Association for Computing Machinery.
[31]
Q. Xie, C. Pang, X. Zhou, X. Zhang, and K. Deng. Maximum error-bounded Piecewise Linear Representation for online stream approximation. The VLDB Journal, 23(6):915--937, Dec. 2014.
[32]
H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 323--336, Houston, TX, USA, May 2018. Association for Computing Machinery.

Cited By

View all
  • (2025)Data-centric Artificial Intelligence: A SurveyACM Computing Surveys10.1145/371111857:5(1-42)Online publication date: 24-Jan-2025
  • (2025)Learning Road Network Index Structure for Efficient Map MatchingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348519537:1(423-437)Online publication date: 1-Jan-2025
  • (2025)How good are multi-dimensional learned indexes? An experimental surveyThe VLDB Journal10.1007/s00778-024-00893-634:2Online publication date: 21-Jan-2025
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 1
September 2020
73 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 September 2020
Published in PVLDB Volume 14, Issue 1

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)7
Reflects downloads up to 25 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Data-centric Artificial Intelligence: A SurveyACM Computing Surveys10.1145/371111857:5(1-42)Online publication date: 24-Jan-2025
  • (2025)Learning Road Network Index Structure for Efficient Map MatchingIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348519537:1(423-437)Online publication date: 1-Jan-2025
  • (2025)How good are multi-dimensional learned indexes? An experimental surveyThe VLDB Journal10.1007/s00778-024-00893-634:2Online publication date: 21-Jan-2025
  • (2024)Benchmarking in online maritime education: tracing the evolution of assessment in electronic educational environmentsEducational Dimension10.55056/ed.80411(146-159)Online publication date: 15-Dec-2024
  • (2024)Adaptive and Scalable Database Management with Machine Learning Integration: A PostgreSQL Case StudyInformation10.3390/info1509057415:9(574)Online publication date: 18-Sep-2024
  • (2024)Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel ApproachesInformation10.3390/info1508042915:8(429)Online publication date: 24-Jul-2024
  • (2024)Optimizing Database Performance in Complex Event Processing through Indexing StrategiesData10.3390/data90800939:8(93)Online publication date: 24-Jul-2024
  • (2024)Towards Systematic Index DynamizationProceedings of the VLDB Endowment10.14778/3681954.368196917:11(2867-2879)Online publication date: 30-Aug-2024
  • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 31-May-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 31-May-2024
  • Show More Cited By

View Options

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media