Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Updatable learned index with precise positions

Published: 01 April 2021 Publication History

Abstract

Index plays an essential role in modern database engines to accelerate the query processing. The new paradigm of "learned index" has significantly changed the way of designing index structures in DBMS. The key insight is that indexes could be regarded as learned models that predict the position of a lookup key in the dataset. While such studies show promising results in both lookup time and index size, they cannot efficiently support update operations. Although recent studies have proposed some preliminary approaches to support update, they are at the cost of scarifying the lookup performance as they suffer from the overheads brought by imprecise predictions in the leaf nodes.
In this paper, we propose LIPP, a brand new framework of learned index to address such issues. Similar with state-of-the-art learned index structures, LIPP is able to support all kinds of index operations, namely lookup query, range query, insert, delete, update and bulkload. Meanwhile, we overcome the limitations of previous studies by properly extending the tree structure when dealing with update operations so as to eliminate the deviation of location predicted by the models in the leaf nodes. Moreover, we further propose a dynamic adjustment strategy to ensure that the height of the tree index is tightly bounded and provide comprehensive theoretical analysis to illustrate it. We conduct an extensive set of experiments on several real-life and synthetic datasets. The results demonstrate that our method consistently outperforms state-of-the-art solutions, achieving by up to 4X for a broader class of workloads with different index operations.

References

[1]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD. 1009--1024.
[2]
Rudolf Bayer and Edward M. McCreight. 1972. Organization and Maintenance of Large Ordered Indices. Acta Informatica 1 (1972), 173--189.
[3]
Rudolf Bayer and Mario Schkolnick. 1977. Concurrency of Operations on B-Trees. Acta Informatica 9 (1977), 1--21.
[4]
Felix Beier and Kai-Uwe Sattler. 2017. GPU-GIST - a case of generalized database indexing on modern hardware. it Inf. Technol. 59, 3 (2017), 141.
[5]
Timo Bingmann. [n.d.]. STX B+ Tree. https://panthema.net/2007/stx-btree/, Version 0.9.
[6]
Robert Binna, Eva Zangerle, Martin Pichl, Günther Specht, and Viktor Leis. 2018. HOT: A Height Optimized Trie Index for Main-Memory Database Systems. In SIGMOD. 521--534.
[7]
Joan Boyar and Kim S. Larsen. 1992. Efficient Rebalancing of Chromatic Search Trees. In Algorithm Theory - SWAT, Otto Nurmi and Esko Ukkonen (Eds.), Vol. 621. 151--164.
[8]
Shimin Chen, Phillip B. Gibbons, and Todd C. Mowry. 2001. Improving Index Performance through Prefetching. In SIGMOD. 235--246.
[9]
Shimin Chen and Qin Jin. 2015. Persistent B+-Trees in Non-Volatile Main Memory. PVLDB 8, 7 (2015), 786--797.
[10]
Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, and Clifford Stein. 2009. Introduction to Algorithms, 3rd Edition. MIT Press. http://mitpress.mit.edu/books/introduction-algorithms
[11]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, David B. Lomet, and Tim Kraska. 2020. ALEX: An Updatable Adaptive Learned Index. In SIGMOD. 969--984.
[12]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. PVLDB 13, 8 (2020), 1162--1175.
[13]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2018. A-Tree: A Bounded Approximate Index Structure. CoRR abs/1801.10207 (2018).
[14]
Alex Galakatos, Michael Markovitch, Carsten Binnig, Rodrigo Fonseca, and Tim Kraska. 2019. FITing-Tree: A Data-aware Index Structure. In SIGMOD. 1189--1206.
[15]
Richard A. Hankins and Jignesh M. Patel. 2003. Effect of node size on the performance of cache-conscious B+-trees. In SIGMETRICS. 283--294.
[16]
Chen-Yu Hsu, Piotr Indyk, Dina Katabi, and Ali Vakilian. 2019. Learning-Based Frequency Estimation Algorithms. In ICLR. OpenReview.net.
[17]
Martin V. Jørgensen, René Bech Rasmussen, Simonas Saltenis, and Carsten Schjønning. 2011. FB-tree: a B+-tree for flash-based SSDs. In IDEAS. ACM, 34--42.
[18]
Changkyu Kim, Jatin Chhugani, Nadathur Satish, Eric Sedlar, Anthony D. Nguyen, Tim Kaldewey, Victor W. Lee, Scott A. Brandt, and Pradeep Dubey. 2010. FAST: fast architecture sensitive tree search on modern CPUs and GPUs. In SIGMOD. 339--350.
[19]
Mincheol Kim, Ling Liu, and Wonik Choi. 2018. A GPU-Aware Parallel Index for Processing High-Dimensional Big Data. IEEE Trans. Computers 67, 10 (2018), 1388--1402.
[20]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. CoRR abs/1911.13014 (2019).
[21]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned index. In aiDM@SIGMOD 2020. 5:1--5:5.
[22]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In SIGMOD. 489--504.
[23]
Ani Kristo, Kapil Vaidya, Ugur Çetintemel, Sanchit Misra, and Tim Kraska. 2020. The Case for a Learned Sorting Algorithm. In SIGMOD. 1001--1016.
[24]
Philip L. Lehman and S. Bing Yao. 1981. Efficient Locking for Concurrent Operations on B-Trees. ACM Trans. Database Syst. 6, 4 (1981), 650--670.
[25]
Tobin J. Lehman and Michael J. Carey. 1986. A Study of Index Structures for Main Memory Database Management Systems. In VLDB, Wesley W. Chu, Georges Gardarin, Setsuo Ohsuga, and Yahiko Kambayashi (Eds.). 294--303.
[26]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38--49.
[27]
Justin J. Levandoski, David B. Lomet, and Sudipta Sengupta. 2013. The Bw-Tree: A B-tree for new hardware platforms. In ICDE. IEEE Computer Society, 302--313.
[28]
Sungchae Lim, Joonseon Ahn, and Myoung-Ho Kim. 2003. A Concurrent B-Tree Algorithm Using a Cooperative Locking Protocol. In BNCOD, Vol. 2712. 253--260.
[29]
Jihang Liu, Shimin Chen, and Lujun Wang. 2020. LB+-Trees: Optimizing Persistent Index Performance on 3DXPoint Memory. PVLDB 13, 7 (2020), 1078--1090.
[30]
Lin Ma, Dana Van Aken, Ahmed Hefny, Gustavo Mezerhane, Andrew Pavlo, and Geoffrey J. Gordon. 2018. Query-based Workload Forecasting for Self-Driving Database Management Systems. In SIGMOD. 631--645.
[31]
Ryan Marcus, Andreas Kipf, Alexander van Renen, Mihail Stoian, Sanchit Misra, Alfons Kemper, Thomas Neumann, and Tim Kraska. 2020. Benchmarking Learned Indexes. CoRR abs/2006.12804 (2020).
[32]
Ryan Marcus and Olga Papaemmanouil. 2019. Towards a Hands-Free Query Optimizer through Deep Learning. In CIDR.
[33]
Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. PVLDB 12, 11 (2019), 1705--1718.
[34]
Ryan C. Marcus and Olga Papaemmanouil. 2019. Plan-Structured Deep Neural Network Models for Query Performance Prediction. PVLDB 12, 11 (2019), 1733--1746.
[35]
Pak Markthub, Mehmet E. Belviranli, Seyong Lee, Jeffrey S. Vetter, and Satoshi Matsuoka. 2018. DRAGON: breaking GPU memory capacity limits with direct NVM access. In SC. IEEE / ACM, 32:1--32:13.
[36]
C. Mohan. 1990. ARIES/KVL: A Key-Value Locking Method for Concurrency Control of Multiaction Transactions Operating on B-Tree Indexes. In PVLDB, Dennis McLeod, Ron Sacks-Davis, and Hans-Jörg Schek (Eds.). Morgan Kaufmann, 392--405.
[37]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In SIGMOD. 985--1000.
[38]
Patrick E. O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth J. O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Informatica 33, 4 (1996), 351--385.
[39]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-Driving Database Management Systems. In CIDR.
[40]
Jun Rao and Kenneth A. Ross. 1999. Cache Conscious Indexing for Decision-Support in Main Memory. In VLDB, Malcolm P. Atkinson, Maria E. Orlowska, Patrick Valduriez, Stanley B. Zdonik, and Michael L. Brodie (Eds.). 78--89.
[41]
Dimitrios Siakavaras, Panagiotis Billis, Konstantinos Nikas, Georgios I. Goumas, and Nectarios Koziris. 2020. Efficient Concurrent Range Queries in B+-trees using RCU-HTM. In SPAA, Christian Scheideler and Michael Spear (Eds.). ACM, 571--573.
[42]
Chuzhe Tang, Youyun Wang, Zhiyuan Dong, Gansen Hu, Zhaoguo Wang, Minjie Wang, and Haibo Chen. 2020. XIndex: a scalable learned index for multicore data storage. In PPoPP. ACM, 308--320.
[43]
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, and Sriram Rao. 2018. Towards a Learning Optimizer for Shared Clouds. Proc. VLDB Endow. 12, 3 (2018), 210--222.
[44]
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. arXiv:2104.05520 [cs.DB]
[45]
Lei Yang, Hong Wu, Tieying Zhang, Xuntao Cheng, Feifei Li, Lei Zou, Yujie Wang, Rongyao Chen, Jianying Wang, and Gui Huang. 2020. Leaper: A Learned Prefetcher for Cache Invalidation in LSM-tree based Storage Engines. PVLDB 13, 11 (2020), 1976--1989.
[46]
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, PerÄÅke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning Data Layouts for Big Data Analytics. In SIGMOD. 193--208.
[47]
Zongheng Yang, Eric Liang, Amog Kamsetty, Chenggang Wu, Yan Duan, Peter Chen, Pieter Abbeel, Joseph M. Hellerstein, Sanjay Krishnan, and Ion Stoica. 2019. Deep Unsupervised Cardinality Estimation. PVLDB 13, 3 (2019), 279--292.
[48]
Chi Zhang, Ryan Marcus, Anat Kleiman, and Olga Papaemmanouil. 2020. Buffer Pool Aware Query Scheduling via Deep Reinforcement Learning. CoRR abs/2007.10568 (2020).

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 1-Apr-2024
  • (2024)SLIPP: A Space-Efficient Learned Index for String KeysProceedings of the 2024 6th International Conference on Big-data Service and Intelligent Computation10.1145/3686540.3686550(69-77)Online publication date: 29-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 14, Issue 8
April 2021
200 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 April 2021
Published in PVLDB Volume 14, Issue 8

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)68
  • Downloads (Last 6 weeks)3
Reflects downloads up to 09 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)Accelerating String-Key Learned Index Structures via Memoization-Based Incremental TrainingProceedings of the VLDB Endowment10.14778/3659437.365943917:8(1802-1815)Online publication date: 1-Apr-2024
  • (2024)SLIPP: A Space-Efficient Learned Index for String KeysProceedings of the 2024 6th International Conference on Big-data Service and Intelligent Computation10.1145/3686540.3686550(69-77)Online publication date: 29-May-2024
  • (2024)Kanva: A Lock-free Learned Search Data StructureProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673082(252-261)Online publication date: 12-Aug-2024
  • (2024)Benchmarking Learned and LSM Indexes for Data SortednessProceedings of the Tenth International Workshop on Testing Database Systems10.1145/3662165.3662764(16-22)Online publication date: 9-Jun-2024
  • (2024)Making In-Memory Learned Indexes Efficient on DiskProceedings of the ACM on Management of Data10.1145/36549542:3(1-26)Online publication date: 30-May-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
  • (2024)One Seed, Two Birds: A Unified Learned Structure for Exact and Approximate CountingProceedings of the ACM on Management of Data10.1145/36392702:1(1-26)Online publication date: 26-Mar-2024
  • (2024)LSGraph: A Locality-centric High-performance Streaming Graph EngineProceedings of the Nineteenth European Conference on Computer Systems10.1145/3627703.3650076(33-49)Online publication date: 22-Apr-2024
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media