Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

Published: 30 May 2023 Publication History

Abstract

Learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time.

Supplemental Material

MP4 File
Video presentation for PACMMOD-V1mod063-RLR-Tree

References

[1]
Lars Arge, Mark De Berg, Herman Haverkort, and Ke Yi. 2008. The priority R-tree: A practically efficient and worst-case optimal R-tree. ACM Transactions on Algorithms (TALG), Vol. 4, 1 (2008), 1--30.
[2]
Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international conference on Management of data. 322--331.
[3]
Norbert Beckmann and Bernhard Seeger. 2009. A revised r*-tree in comparison with related index structures. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 799--812.
[4]
Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM, Vol. 18, 9 (1975), 509--517.
[5]
Angjela Davitkova, Evica Milchevski, and Sebastian Michel. 2020. The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. In EDBT. 407--410.
[6]
Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020a. ALEX: an updatable adaptive learned index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 969--984.
[7]
Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020b. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. arXiv preprint arXiv:2006.13282 (2020).
[8]
Christos Faloutsos. 1986. Multiattribute hashing using gray codes. In Proceedings of the 1986 ACM SIGMOD international conference on Management of data. 227--238.
[9]
Christos Faloutsos and Shari Roseman. 1989. Fractals for secondary key retrieval. In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. 247--252.
[10]
Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, Vol. 13, 10 (2020), 1162--1175.
[11]
Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta informatica, Vol. 4, 1 (1974), 1--9.
[12]
Yván J Garc'ia R, Mario A López, and Scott T Leutenegger. 1998. A greedy algorithm for bulk loading R-trees. In Proceedings of the 6th ACM international symposium on Advances in geographic information systems. 163--164.
[13]
Diane Greene. 1989. An implementation and performance analysis of spatial data access methods. In Proceedings. Fifth International Conference on Data Engineering. IEEE Computer Society, 606--607.
[14]
Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data. 47--57.
[15]
Ch Md Rakin Haider, Jianguo Wang, Walid G Aref, et al. 2022. The ?AI R"-tree: An Instance-optimized R-tree. In 2022 23rd IEEE International Conference on Mobile Data Management (MDM). IEEE, 9--18.
[16]
Gisli R Hjaltason and Hanan Samet. 1999. Distance browsing in spatial databases. ACM Transactions on Database Systems (TODS), Vol. 24, 2 (1999), 265--318.
[17]
Edwin H Jacox and Hanan Samet. 2007. Spatial join techniques. ACM Transactions on Database Systems (TODS), Vol. 32, 1 (2007), 7--es.
[18]
Ibrahim Kamel and Christos Faloutsos. 1993. On packing R-trees. In Proceedings of the second international conference on Information and knowledge management. 490--499.
[19]
Kothuri Venkata Ravi Kanth, Divyakant Agrawal, Ambuj K Singh, and Amr El Abbadi. 1997. Indexing non-uniform spatial data. In Proceedings of the 1997 International Database Engineering and Applications Symposium (Cat. No. 97TB100166). IEEE, 289--298.
[20]
Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--5.
[21]
Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems, Vol. 30 (2017), 971--980.
[22]
Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. 489--504.
[23]
Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In Proceedings 13th International Conference on Data Engineering. IEEE, 497--506.
[24]
Pengfei Li, Hua Lu, Qian Zheng, Long Yang, and Gang Pan. 2020. LISA: A learned index structure for spatial data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2119--2133.
[25]
Xi Liang, Aaron J Elmore, and Sanjay Krishnan. 2019. Opportunistic view materialization with deep reinforcement learning. arXiv preprint arXiv:1903.01363 (2019).
[26]
Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.
[27]
Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning multi-dimensional indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 985--1000.
[28]
Jack A Orenstein. 1986. Spatial query processing in an object-oriented database system. In Proceedings of the 1986 ACM SIGMOD international conference on Management of data. 326--336.
[29]
Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin Ding, and Alfons Kemper. 2020. The case for learned spatial indexes. arXiv preprint arXiv:2008.10349 (2020).
[30]
Mirjana Pavlovic, Darius Sidlauskas, Thomas Heinis, and Anastasia Ailamaki. 2018. QUASII: query-aware spatial incremental index. In 21st International Conference on Extending Database Technology (EDBT).
[31]
Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.
[32]
Jianzhong Qi, Guanli Liu, Christian S Jensen, and Lars Kulik. 2020. Effectively learning spatial indices. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2341--2354.
[33]
Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. 2018. Theoretically optimal and empirically efficient r-trees with strong parallelizability. Proceedings of the VLDB Endowment, Vol. 11, 5 (2018), 621--634.
[34]
Kenneth A Ross, Inga Sitzmann, and Peter J Stuckey. 2001. Cost-based unbalanced R-trees. In Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001. IEEE, 203--212.
[35]
Nick Roussopoulos, Stephen Kelley, and Frédéric Vincent. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD international conference on Management of data. 71--79.
[36]
Nick Roussopoulos and Daniel Leifker. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the 1985 ACM SIGMOD international conference on Management of data. 17--31.
[37]
Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online index selection using deep reinforcement learning for a cluster database. In 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW). IEEE, 158--161.
[38]
Timos Sellis, Nick Roussopoulos, and Christos Faloutsos. 1987. The R-Tree: A Dynamic Index for Multi-Dimensional Objects. Technical Report.
[39]
Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 1153--1170.
[40]
Haixin Wang, Xiaoyi Fu, Jianliang Xu, and Hua Lu. 2019. Learned Index for Spatial Queries. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). IEEE, 569--574.
[41]
Zheng Wang, Cheng Long, Gao Cong, and Yiding Liu. 2020. Efficient and effective similar subtrajectory search with deep reinforcement learning. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2312--2325.
[42]
Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. arXiv preprint arXiv:2104.05520 (2021).
[43]
Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 193--208.
[44]
Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-lstm for join order selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297--1308.
[45]
Eleni Tzirita Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy, and Volker Markl. 2020. The case for distance-bounded spatial approximations. arXiv preprint arXiv:2010.12548 (2020).
[46]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415--432.

Cited By

View all
  • (2024)SGIR-Tree: Integrating R-Tree Spatial Indexing as Subgraphs in Graph Database Management SystemsISPRS International Journal of Geo-Information10.3390/ijgi1310034613:10(346)Online publication date: 27-Sep-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/3698194Online publication date: 2-Oct-2024
  • Show More Cited By

Index Terms

  1. The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Management of Data
    Proceedings of the ACM on Management of Data  Volume 1, Issue 1
    PACMMOD
    May 2023
    2807 pages
    EISSN:2836-6573
    DOI:10.1145/3603164
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 May 2023
    Published in PACMMOD Volume 1, Issue 1

    Permissions

    Request permissions for this article.

    Author Tags

    1. deep learning
    2. learned index
    3. reinforcement learning
    4. spatial data
    5. spatial query processing

    Qualifiers

    • Research-article

    Funding Sources

    • Alibaba-NTU Singapore Joint Research Institute

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)1,031
    • Downloads (Last 6 weeks)117
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)SGIR-Tree: Integrating R-Tree Spatial Indexing as Subgraphs in Graph Database Management SystemsISPRS International Journal of Geo-Information10.3390/ijgi1310034613:10(346)Online publication date: 27-Sep-2024
    • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
    • (2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/3698194Online publication date: 2-Oct-2024
    • (2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
    • (2024)BT-Tree: A Reinforcement Learning Based Index for Big Trajectory DataProceedings of the ACM on Management of Data10.1145/36771302:4(1-27)Online publication date: 30-Sep-2024
    • (2024)Unicorn: A Unified Multi-Tasking Matching ModelACM SIGMOD Record10.1145/3665252.366526353:1(44-53)Online publication date: 14-May-2024
    • (2024)SchemaPile: A Large Collection of Relational Database SchemasProceedings of the ACM on Management of Data10.1145/36549752:3(1-25)Online publication date: 30-May-2024
    • (2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
    • (2024)Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671583(5860-5871)Online publication date: 25-Aug-2024
    • (2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media