Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3401071.3401659acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article
Open access

RadixSpline: a single-pass learned index

Published: 14 June 2020 Publication History

Abstract

Recent research has shown that learned models can outperform state-of-the-art index structures in size and lookup performance. While this is a very promising result, existing learned structures are often cumbersome to implement and are slow to build. In fact, most approaches that we are aware of require multiple training passes over the data.
We introduce RadixSpline (RS), a learned index that can be built in a single pass over the data and is competitive with state-of-the-art learned index models, like RMI, in size and lookup performance. We evaluate RS using the SOSD benchmark and show that it achieves competitive results on all datasets, despite the fact that it only has two parameters.

References

[1]
STX B+ Tree, https://panthema.net/2007/stx-btree/.
[2]
R. Binna, E. Zangerle, M. Pichl, G. Specht, and V. Leis. HOT: A height optimized trie index for main-memory database systems. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 521--534, New York, NY, USA, 2018. Association for Computing Machinery.
[3]
J. Ding, U. F. Minhas, H. Zhang, Y. Li, C. Wang, B. Chandramouli, J. Gehrke, D. Kossmann, and D. Lomet. ALEX: An Updatable Adaptive Learned Index. arXiv:1905.08898 [cs], May 2019.
[4]
P. Fent, M. Jungmair, A. Kipf, and T. Neumann. START --- Self-Tuning Adaptive Radix Tree. In 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW), pages 147--153, 2020.
[5]
P. Ferragina and G. Vinciguerra. The PGM-index: A fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, 13(8):1162--1175, Apr. 2020.
[6]
A. Galakatos, M. Markovitch, C. Binnig, R. Fonseca, and T. Kraska. FITing-Tree: A Data-aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 1189--1206, New York, NY, USA, 2019. ACM.
[7]
J. Gottschlich, A. Solar-Lezama, N. Tatbul, M. Carbin, M. Rinard, R. Barzilay, S. Amarasinghe, J. B. Tenenbaum, and T. Mattson. The three pillars of machine programming. In Proceedings of the 2nd ACM SIGPLAN International Workshop on Machine Learning and Programming Languages, MAPL 2018, pages 69--80, Philadelphia, PA, USA, June 2018. Association for Computing Machinery.
[8]
G. Graefe. B-tree indexes, interpolation search, and skew. In Proceedings of the 2nd International Workshop on Data Management on New Hardware, DaMoN '06, Chicago, Illinois, June 2006. Association for Computing Machinery.
[9]
C. Kim, J. Chhugani, N. Satish, E. Sedlar, A. D. Nguyen, T. Kaldewey, V. W. Lee, S. A. Brandt, and P. Dubey. FAST: Fast architecture sensitive tree search on modern CPUs and GPUs. In Proceedings of the 2010 International Conference on Management of Data, SIGMOD '10, 2010.
[10]
A. Kipf, T. Kipf, B. Radke, V. Leis, P. Boncz, and A. Kemper. Learned Cardinalities: Estimating Correlated Joins with Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.
[11]
A. Kipf, R. Marcus, A. van Renen, M. Stoian, A. Kemper, T. Kraska, and T. Neumann. SOSD: A Benchmark for Learned Indexes. In ML for Systems at NeurIPS, MLForSystems @ NeurIPS '19, Dec. 2019.
[12]
A. Kipf, D. Vorona, J. Müller, T. Kipf, B. Radke, V. Leis, P. Boncz, T. Neumann, and A. Kemper. Estimating Cardinalities with Deep Sketches. In Proceedings of the 2019 International Conference on Management of Data, SIGMOD '19, pages 1937--1940, Amsterdam, Netherlands, June 2019. Association for Computing Machinery.
[13]
T. Kraska, A. Beutel, E. H. Chi, J. Dean, and N. Polyzotis. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 489--504, New York, NY, USA, 2018. ACM.
[14]
V. Leis, A. Kemper, and T. Neumann. The adaptive radix tree: ARTful indexing for main-memory databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering, ICDE '13, pages 38--49, USA, 2013. IEEE Computer Society.
[15]
C. Luo and M. J. Carey. LSM-based storage techniques: A survey. PVLDB, 29(1):393--418, Jan. 2020.
[16]
R. Marcus, P. Negi, H. Mao, N. Tatbul, M. Alizadeh, and T. Kraska. Bao: Learning to Steer Query Optimizers. arXiv:2004.03814 [cs], Apr. 2020.
[17]
R. Marcus, P. Negi, H. Mao, C. Zhang, M. Alizadeh, T. Kraska, O. Papaemmanouil, and N. Tatbul. Neo: A Learned Query Optimizer. PVLDB, 12(11):1705--1718, 2019.
[18]
R. Marcus and O. Papaemmanouil. Deep Reinforcement Learning for Join Order Enumeration. In First International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM@SIGMOD '18, Houston, TX, 2018.
[19]
R. Marcus and O. Papaemmanouil. Towards a Hands-Free Query Optimizer through Deep Learning. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.
[20]
R. Marcus, E. Zhang, and T. Kraska. CDFShop: Exploring and Optimizing Learned Index Structures. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, SIGMOD '20, Portland, OR, June 2020.
[21]
P. Negi, R. Marcus, H. Mao, N. Tatbul, T. Kraska, and M. Alizadeh. Cost-Guided Cardinality Estimation: Focus Where it Matters. In Workshop on Self-Managing Databases, SMDB @ ICDE '20, 2020.
[22]
T. Neumann and S. Michel. Smooth interpolating histograms with error guarantees. In Sharing Data, Information and Knowledge, 25th British National Conference on Databases, BNCOD '08, pages 126--138, 2008.
[23]
P. O'Neil, E. Cheng, D. Gawlick, and E. O'Neil. The log-structured merge-tree (LSM-tree). Acta Informatica, 33(4):351--385, June 1996.
[24]
J. Ortiz, M. Balazinska, J. Gehrke, and S. S. Keerthi. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In 2nd Workshop on Data Managmeent for End-to-End Machine Learning, DEEM '18, 2018.
[25]
N. Setiawan, B. Rubinstein, and R. Borovica-Gajic. Function Interpolation for Learned Index Structures. In Database Theory and Applications, DTA '20, 2020.
[26]
Shrainik Jain, Jiaqi Yan, Thiery Cruanes, and Bill Howe. Database-Agnostic Workload Management. In 9th Biennial Conference on Innovative Data Systems Research, CIDR '19, 2019.
[27]
J. Sun and G. Li. An end-to-end learning-based cost estimator. Proceedings of the VLDB Endowment, 13(3):307--319, Nov. 2019.
[28]
I. Trummer, S. Moseley, D. Maram, S. Jo, and J. Antonakakis. SkinnerDB: Regret-bounded Query Evaluation via Reinforcement Learning. PVLDB, 11(12):2074--2077, 2018.
[29]
D. Van Aken, A. Pavlo, G.J. Gordon, and B. Zhang. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data, SIGMOD '17, pages 1009--1024, New York, NY, USA, 2017. ACM.
[30]
L. Woltmann, C. Hartmann, M. Thiele, D. Habich, and W. Lehner. Cardinality estimation with local deep learning models. In Proceedings of the Second International Workshop on Exploiting Artificial Intelligence Techniques for Data Management, aiDM '19, pages 1--8, Amsterdam, Netherlands, July 2019. Association for Computing Machinery.
[31]
Z. Yang, E. Liang, A. Kamsetty, C. Wu, Y. Duan, X. Chen, P. Abbeel, J. M. Hellerstein, S. Krishnan, and I. Stoica. Deep unsupervised cardinality estimation. Proceedings of the VLDB Endowment, 13(3):279--292, Nov. 2019.
[32]
H. Zhang, H. Lim, V. Leis, D. G. Andersen, M. Kaminsky, K. Keeton, and A. Pavlo. SuRF: Practical Range Query Filtering with Fast Succinct Tries. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD '18, pages 323--336, Houston, TX, USA, May 2018. Association for Computing Machinery.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
aiDM '20: Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management
June 2020
33 pages
ISBN:9781450380294
DOI:10.1145/3401071
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 14 June 2020

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

aiDM '20 Paper Acceptance Rate 6 of 6 submissions, 100%;
Overall Acceptance Rate 19 of 26 submissions, 73%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)606
  • Downloads (Last 6 weeks)81
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)Oasis: An Optimal Disjoint Segmented Learned Range FilterProceedings of the VLDB Endowment10.14778/3659437.365944717:8(1911-1924)Online publication date: 1-Apr-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 1-Apr-2024
  • (2024)The role of classifiers and data complexity in learned Bloom filters: insights and recommendationsJournal of Big Data10.1186/s40537-024-00906-911:1Online publication date: 27-Mar-2024
  • (2024)SLIPP: A Space-Efficient Learned Index for String KeysProceedings of the 2024 6th International Conference on Big-data Service and Intelligent Computation10.1145/3686540.3686550(69-77)Online publication date: 29-May-2024
  • (2024)Revisiting Learned Index with Byte-addressable Persistent StorageProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673113(929-938)Online publication date: 12-Aug-2024
  • (2024)Learned Index Acceleration with FPGAs: A SMART ApproachProceedings of the 14th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies10.1145/3665283.3665287(71-80)Online publication date: 19-Jun-2024
  • (2024)Making In-Memory Learned Indexes Efficient on DiskProceedings of the ACM on Management of Data10.1145/36549542:3(1-26)Online publication date: 30-May-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media