Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Are updatable learned indexes ready?

Published: 01 July 2022 Publication History

Abstract

Recently, numerous promising results have shown that updatable learned indexes can perform better than traditional indexes with much lower memory space consumption. But it is unknown how these learned indexes compare against each other and against the traditional ones under realistic workloads with changing data distributions and concurrency levels. This makes practitioners still wary about how these new indexes would actually behave in practice. To fill this gap, this paper conducts the first comprehensive evaluation on updatable learned indexes. Our evaluation uses ten real datasets and various workloads to challenge learned indexes in three aspects: performance, memory space efficiency and robustness. Based on the results, we give a series of takeaways that can guide the future development and deployment of learned indexes.

References

[1]
R. Bayer and M. Schkolnick. 1977. Concurrency of Operations on B-Trees. Acta Inf. (1977).
[2]
Laurent Bindschaedler, Andreas Kipf, Tim Kraska, Ryan Marcus, and Umar Farooq Minhas. 2021. Towards a Benchmark for Learned Systems. In 2021 IEEE 37th International Conference on Data Engineering Workshops (ICDEW). 127--133.
[3]
Timo Bingmann. 2013. STX B+ Tree 0.9. https://panthema.net/2007/stx-btree/, retrieved Sep. 1, 2021.
[4]
Robert Binna and et al. 2018. HOT: A Height Optimized Trie Index for Main-Memory Database Systems. In Proceedings of the 2018 International Conference on Management of Data.
[5]
Antonio Boffa, Paolo Ferragina, and Giorgio Vinciguerra. 2022. A Learned Approach to Design Compressed Rank/Select Data Structures. ACM Transactions on Algorithms (2022).
[6]
Surajit Chaudhuri, Mayur Datar, and Vivek R. Narasayya. 2004. Index Selection for Databases: A Hardness Study and a Principled Heuristic Solution. IEEE Trans. Knowl. Data Eng. 16, 11 (2004), 1313--1323.
[7]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB. 146--155.
[8]
Google Cloud. 2017. OpenStreetMap. (2017). https://console.cloud.google.com/marketplace/details/openstreetmap/geo-openstreetmap.
[9]
James C. Corbett and et al. 2012. Spanner: Google's Globally-Distributed Database. In OSDI, Chandu Thekkath and Amin Vahdat (Eds.).
[10]
Andrew Crotty. 2021. Hist-Tree: Those Who Ignore It Are Doomed to Learn. In CIDR.
[11]
Yifan Dai, Yien Xu, Aishwarya Ganesan, Ramnatthan Alagappan, Brian Kroth, Andrea Arpaci-Dusseau, and Remzi Arpaci-Dusseau. 2020. From WiscKey to Bourbon: A Learned Index for Log-Structured Merge Trees. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20).
[12]
Dinesh Das, Jiaqi Yan, Mohamed Zaït, Satyanarayana R. Valluri, Nirav Vyas, Ramarajan Krishnamachari, Prashant Gaharwar, Jesse Kamp, and Niloy Mukherjee. 2015. Query Optimization in Oracle 12c Database In-Memory. Proc. VLDB Endow. 8, 12 (2015), 1770--1781.
[13]
Angjela Davitkova, Evica Milchevski, and Sebastian Michel. 2020. The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. In Proceedings of the 23rd International Conference on Extending Database Technology, EDBT.
[14]
Jialin Ding and et al. 2020. ALEX: An Updatable Adaptive Learned Index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data.
[15]
Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. PVLDB (2020).
[16]
Paolo Ferragina and Giorgio Vinciguerra. 2020. Learned Data Structures. In Recent Trends in Learning From Data. Springer International Publishing, 5--41.
[17]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-Index: A Fully-Dynamic Compressed Learned Index with Provable Worst-Case Bounds. Proc. VLDB Endow. (2020).
[18]
Keir Fraser. 2004. Practical lock-freedom. Ph.D. Dissertation. University of Cambridge, UK.
[19]
Alex Galakatos and et al. 2019. FITing-Tree: A Data-Aware Index Structure. In Proceedings of the 2019 International Conference on Management of Data (SIGMOD '19).
[20]
Ali Hadian and Thomas Heinis. 2021. Shift-Table: A Low-latency Learned Index for Range Queries using Model Correction. In EDBT. 253--264.
[21]
Andreas Kipf and et al. 2020. RadixSpline: A Single-Pass Learned Index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management (aiDM '20).
[22]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2019. SOSD: A Benchmark for Learned Indexes. NeurIPS Workshop on Machine Learning for Systems (2019).
[23]
Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proc. VLDB Endow. 13, 11 (2020), 2382--2395.
[24]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data (SIGMOD '18).
[25]
Jens Krueger, Changkyu Kim, Martin Grund, Nadathur Satish, David Schwalb, Jatin Chhugani, Hasso Plattner, Pradeep Dubey, and Alexander Zeier. 2011. Fast Updates on Read-Optimized Databases Using Multi-Core CPUs. Proc. VLDB Endow. 5, 1 (sep 2011), 61--72.
[26]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases. In Proceedings of the 2013 IEEE International Conference on Data Engineering.
[27]
Viktor Leis, Florian Scheibner, Alfons Kemper, and Thomas Neumann. 2016. The ART of Practical Synchronization. In Proceedings of the 12th International Workshop on Data Management on New Hardware (DaMoN '16).
[28]
Lucas Lersch and et al. 2019. Evaluating Persistent Memory Range Indexes. Proc. VLDB Endow. (2019).
[29]
Lucas Lersch, Ivan Schreter, Ismail Oukid, and Wolfgang Lehner. 2020. Enabling Low Tail Latency on Multicore Key-Value Stores. Proc. VLDB Endow. (2020).
[30]
Pengfei Li, Yu Hua, Jingnan Jia, and Pengfei Zuo. 2021. FINEdex: A Fine-grained Learned Index Scheme for Scalable and Concurrent Memory Systems. Proc. VLDB Endow. 15, 2 (2021), 321--334.
[31]
Pengfei Li, Hua Lu, Qian Zheng, Long Yang, and Gang Pan. 2020. LISA: A learned index structure for spatial data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2119--2133.
[32]
Libraries.io. 2017. Repository ID. (2017). https://libraries.io/data.
[33]
Christian E Lopez and Caleb Gallemore. 2021. An augmented multilingual Twitter dataset for studying the COVID-19 infodemic. Social Network Analysis and Mining 11, 1 (2021), 1--14.
[34]
Baotong Lu, Jialin Ding, Eric Lo, Umar Farooq Minhas, and Tianzheng Wang. 2021. APEX: A High-Performance Learned Index on Persistent Memory. Proc. VLDB Endow. 15, 3 (2021), 597--610.
[35]
Haonan Lu, Siddhartha Sen, and Wyatt Lloyd. 2020. Performance-Optimal Read-Only Transactions. In 14th USENIX Symposium on Operating Systems Design and Implementation (OSDI 20). 333--349.
[36]
Vincent Y. Lum and Huei Ling. 1971. An optimization problem on the selection of secondary keys. In Proceedings of the 26th ACM annual conference. ACM, 349--356.
[37]
Marcel Maltry and Jens Dittrich. 2022. A Critical Analysis of Recursive Model Indexes. Proc. VLDB Endow. (2022).
[38]
Yandong Mao, Eddie Kohler, and Robert Tappan Morris. 2012. Cache craftiness for fast multicore key-value storage. In EuroSys. ACM, 183--196.
[39]
Ryan Marcus and et al. 2020. Benchmarking Learned Indexes. Proc. VLDB Endow. (2020).
[40]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Nesime Tatbul, Mohammad Alizadeh, and Tim Kraska. 2021. Bao: Making Learned Query Optimization Practical. In SIGMOD. 1275--1288.
[41]
Ryan C. Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2019. Neo: A Learned Query Optimizer. Proc. VLDB Endow. 12, 11 (2019), 1705--1718.
[42]
Volker Markl, Guy M. Lohman, and Vijayshankar Raman. 2003. LEO: An autonomic query optimizer for DB2. IBM Syst. J. (2003).
[43]
Ajit Mathew and Changwoo Min. 2020. HydraList: A Scalable in-Memory Index Using Asynchronous Updates and Partial Replication. Proc. VLDB Endow. (2020).
[44]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning Multi-Dimensional Indexes. In SIGMOD. 985--1000.
[45]
Patrick O'Neil, Edward Cheng, Dieter Gawlick, and Elizabeth O'Neil. 1996. The Log-Structured Merge-Tree (LSM-Tree). Acta Inf. (1996).
[46]
Joseph O'Rourke. 1981. An On-Line Algorithm for Fitting Straight Lines between Data Ranges. Commun. ACM (1981).
[47]
Jianzhong Qi, Guanli Liu, Christian S. Jensen, and Lars Kulik. 2020. Effectively Learning Spatial Indices. PVLDB 13, 12 (2020), 2341--2354.
[48]
Suhas S.P. Rao and et al. 2014. A 3D Map of the Human Genome at Kilobase Resolution Reveals Principles of Chromatin Looping. Cell (2014).
[49]
Robin Rehrmann, Carsten Binnig, Alexander Böhm, Kihong Kim, Wolfgang Lehner, and Amr Rizk. 2018. OLTPshare: The case for sharing in OLTP workloads. Proceedings of the VLDB Endowment 11, 12 (2018), 1769--1780.
[50]
Srinath Shankar and et al. 2012. Query optimization in microsoft SQL server PDW. In Proceedings of the ACM SIGMOD International Conference on Management of Data, SIGMOD. ACM, 767--776.
[51]
Benjamin Spector, Andreas Kipf, Kapil Vaidya, Chi Wang, Umar Farooq Minhas, and Tim Kraska. 2021. Bounding the Last Mile: Efficient Learned String Indexing (Extended Abstracts). In 3rd International Workshop on Applied AI for Database Systems and Applications, AIDB Workshops.
[52]
Stackoverflow. 2021. Vote ID. (2021). https://archive.org/download/stackexchange.
[53]
Chuzhe Tang and et al. 2020. XIndex: A Scalable Learned Index for Multicore Data Storage. In Proceedings of the 25th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '20).
[54]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In Proceedings of the 16th International Conference on Data Engineering. 101--110.
[55]
Haixin Wang, Xiaoyi Fu, Jianliang Xu, and Hua Lu. 2019. Learned Index for Spatial Queries. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). 569--574.
[56]
Youyun Wang, Chuzhe Tang, Zhaoguo Wang, and Haibo Chen. 2020. SIndex: a scalable learned index for string keys. In Proceedings of the 11th ACM SIGOPS Asia-Pacific Workshop on Systems. 17--24.
[57]
Ziqi Wang, Andrew Pavlo, Hyeontaek Lim, Viktor Leis, Huanchen Zhang, Michael Kaminsky, and David G. Andersen. 2018. Building a Bw-Tree Takes More Than Just Buzz Words. In Proceedings of the 2018 International Conference on Management of Data (Houston, TX, USA) (SIGMOD '18). Association for Computing Machinery, New York, NY, USA, 473--488.
[58]
Chaichon Wongkham, Baotong Lu, Chris Liu, Zhicong Zhong, Eric Lo, and Tianzheng Wang. 2022. Are Updatable Learned Indexes Ready? (Extended Version). arXiv (2022).
[59]
Edward L Wright, Peter RM Eisenhardt, Amy K Mainzer, Michael E Ressler, Roc M Cutri, Thomas Jarrett, J Davy Kirkpatrick, Deborah Padgett, Robert S McMillan, Michael Skrutskie, et al. 2010. The Wide-field Infrared Survey Explorer (WISE): mission description and initial on-orbit performance. The Astronomical Journal 140, 6 (2010), 1868.
[60]
Jiacheng Wu and et al. 2021. Updatable Learned Index with Precise Positions. Proc. VLDB Endow.
[61]
Xingbo Wu, Fan Ni, and Song Jiang. 2019. Wormhole: A Fast Ordered Index for In-memory Data Management. In EuroSys. 18:1--18:16.
[62]
Zhongle Xie, Qingchao Cai, Gang Chen, Rui Mao, and Meihui Zhang. 2018. A Comprehensive Performance Evaluation of Modern In-Memory Indices. In 2018 IEEE 34th International Conference on Data Engineering (ICDE). 641--652.
[63]
Zongheng Yang and et al. 2020. Qd-Tree: Learning Data Layouts for Big Data Analytics. In ACM SIGMOD International Conference on Management of Data.
[64]
Huanchen Zhang and et al. 2016. Reducing the Storage Overhead of Main-Memory OLTP Databases with Hybrid Indexes. In Proceedings of the 2016 International Conference on Management of Data (SIGMOD '16).
[65]
Songnian Zhang, Suprio Ray, Rongxing Lu, and Yandong Zheng. 2021. SPRIG: A Learned Spatial Index for Range and kNN Queries. In 17th International Symposium on Spatial and Temporal Databases.

Cited By

View all

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment
Proceedings of the VLDB Endowment  Volume 15, Issue 11
July 2022
980 pages
ISSN:2150-8097
Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 July 2022
Published in PVLDB Volume 15, Issue 11

Badges

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)76
  • Downloads (Last 6 weeks)15
Reflects downloads up to 04 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)LITS: An Optimized Learned Index for StringsProceedings of the VLDB Endowment10.14778/3681954.368201017:11(3415-3427)Online publication date: 1-Jul-2024
  • (2024)SLIPP: A Space-Efficient Learned Index for String KeysProceedings of the 2024 6th International Conference on Big-data Service and Intelligent Computation10.1145/3686540.3686550(69-77)Online publication date: 29-May-2024
  • (2024)Making In-Memory Learned Indexes Efficient on DiskProceedings of the ACM on Management of Data10.1145/36549542:3(1-26)Online publication date: 30-May-2024
  • (2024)Hyper: A High-Performance and Memory-Efficient Learned Index via Hybrid ConstructionProceedings of the ACM on Management of Data10.1145/36549482:3(1-26)Online publication date: 30-May-2024
  • (2024)Can Learned Indexes be Built Efficiently? A Deep Dive into Sampling Trade-offsProceedings of the ACM on Management of Data10.1145/36549192:3(1-25)Online publication date: 30-May-2024
  • (2024)LeCo: Lightweight Compression via Learning Serial CorrelationsProceedings of the ACM on Management of Data10.1145/36393202:1(1-28)Online publication date: 26-Mar-2024
  • (2024)Morphtree: a polymorphic main-memory learned index for dynamic workloadsThe VLDB Journal — The International Journal on Very Large Data Bases10.1007/s00778-023-00823-y33:4(1065-1084)Online publication date: 1-Jul-2024
  • (2023)Algorithmic Complexity Attacks on Dynamic Learned IndexesProceedings of the VLDB Endowment10.14778/3636218.363623217:4(780-793)Online publication date: 1-Dec-2023
  • (2023)DILI: A Distribution-Driven Learned IndexProceedings of the VLDB Endowment10.14778/3598581.359859316:9(2212-2224)Online publication date: 1-May-2023
  • (2023)Learned Index: A Comprehensive Experimental EvaluationProceedings of the VLDB Endowment10.14778/3594512.359452816:8(1992-2004)Online publication date: 22-Jun-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media