research-article

Open access

The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data

Authors:

Sheng WangAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 1, Issue 1

Article No.: 63, Pages 1 - 26

https://doi.org/10.1145/3588917

Published: 30 May 2023 Publication History

Abstract

Learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and query processing algorithms currently deployed by the databases, and such a radical departure is likely to encounter challenges and obstacles. In contrast, we propose a fundamentally different way of using ML techniques to build a better R-Tree without the need to change the structure or query processing algorithms of traditional R-Tree. Specifically, we develop reinforcement learning (RL) based models to decide how to choose a subtree for insertion and how to split a node when building and updating an R-Tree, instead of relying on hand-crafted heuristic rules currently used by the R-Tree and its variants. Experiments on real and synthetic datasets with up to more than 100 million spatial objects show that our RL based index outperforms the R-Tree and its variants in terms of query processing time.

Supplemental Material

MP4 File

Video presentation for PACMMOD-V1mod063-RLR-Tree

Download
20.20 MB

References

[1]

Lars Arge, Mark De Berg, Herman Haverkort, and Ke Yi. 2008. The priority R-tree: A practically efficient and worst-case optimal R-tree. ACM Transactions on Algorithms (TALG), Vol. 4, 1 (2008), 1--30.

Digital Library

[2]

Norbert Beckmann, Hans-Peter Kriegel, Ralf Schneider, and Bernhard Seeger. 1990. The R*-tree: an efficient and robust access method for points and rectangles. In Proceedings of the 1990 ACM SIGMOD international conference on Management of data. 322--331.

Digital Library

[3]

Norbert Beckmann and Bernhard Seeger. 2009. A revised r*-tree in comparison with related index structures. In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data. 799--812.

Digital Library

[4]

Jon Louis Bentley. 1975. Multidimensional binary search trees used for associative searching. Commun. ACM, Vol. 18, 9 (1975), 509--517.

Digital Library

[5]

Angjela Davitkova, Evica Milchevski, and Sebastian Michel. 2020. The ML-Index: A Multidimensional, Learned Index for Point, Range, and Nearest-Neighbor Queries. In EDBT. 407--410.

[6]

Jialin Ding, Umar Farooq Minhas, Jia Yu, Chi Wang, Jaeyoung Do, Yinan Li, Hantian Zhang, Badrish Chandramouli, Johannes Gehrke, Donald Kossmann, et al. 2020a. ALEX: an updatable adaptive learned index. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 969--984.

Digital Library

[7]

Jialin Ding, Vikram Nathan, Mohammad Alizadeh, and Tim Kraska. 2020b. Tsunami: A learned multi-dimensional index for correlated data and skewed workloads. arXiv preprint arXiv:2006.13282 (2020).

[8]

Christos Faloutsos. 1986. Multiattribute hashing using gray codes. In Proceedings of the 1986 ACM SIGMOD international conference on Management of data. 227--238.

Digital Library

[9]

Christos Faloutsos and Shari Roseman. 1989. Fractals for secondary key retrieval. In Proceedings of the eighth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems. 247--252.

Digital Library

[10]

Paolo Ferragina and Giorgio Vinciguerra. 2020. The PGM-index: a fully-dynamic compressed learned index with provable worst-case bounds. Proceedings of the VLDB Endowment, Vol. 13, 10 (2020), 1162--1175.

Digital Library

[11]

Raphael A. Finkel and Jon Louis Bentley. 1974. Quad trees a data structure for retrieval on composite keys. Acta informatica, Vol. 4, 1 (1974), 1--9.

[12]

Yván J Garc'ia R, Mario A López, and Scott T Leutenegger. 1998. A greedy algorithm for bulk loading R-trees. In Proceedings of the 6th ACM international symposium on Advances in geographic information systems. 163--164.

[13]

Diane Greene. 1989. An implementation and performance analysis of spatial data access methods. In Proceedings. Fifth International Conference on Data Engineering. IEEE Computer Society, 606--607.

[14]

Antonin Guttman. 1984. R-trees: A dynamic index structure for spatial searching. In Proceedings of the 1984 ACM SIGMOD international conference on Management of data. 47--57.

Digital Library

[15]

Ch Md Rakin Haider, Jianguo Wang, Walid G Aref, et al. 2022. The ?AI R"-tree: An Instance-optimized R-tree. In 2022 23rd IEEE International Conference on Mobile Data Management (MDM). IEEE, 9--18.

[16]

Gisli R Hjaltason and Hanan Samet. 1999. Distance browsing in spatial databases. ACM Transactions on Database Systems (TODS), Vol. 24, 2 (1999), 265--318.

Digital Library

[17]

Edwin H Jacox and Hanan Samet. 2007. Spatial join techniques. ACM Transactions on Database Systems (TODS), Vol. 32, 1 (2007), 7--es.

Digital Library

[18]

Ibrahim Kamel and Christos Faloutsos. 1993. On packing R-trees. In Proceedings of the second international conference on Information and knowledge management. 490--499.

Digital Library

[19]

Kothuri Venkata Ravi Kanth, Divyakant Agrawal, Ambuj K Singh, and Amr El Abbadi. 1997. Indexing non-uniform spatial data. In Proceedings of the 1997 International Database Engineering and Applications Symposium (Cat. No. 97TB100166). IEEE, 289--298.

[20]

Andreas Kipf, Ryan Marcus, Alexander van Renen, Mihail Stoian, Alfons Kemper, Tim Kraska, and Thomas Neumann. 2020. RadixSpline: a single-pass learned index. In Proceedings of the Third International Workshop on Exploiting Artificial Intelligence Techniques for Data Management. 1--5.

Digital Library

[21]

Günter Klambauer, Thomas Unterthiner, Andreas Mayr, and Sepp Hochreiter. 2017. Self-normalizing neural networks. Advances in neural information processing systems, Vol. 30 (2017), 971--980.

[22]

Tim Kraska, Alex Beutel, Ed H Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The case for learned index structures. In Proceedings of the 2018 International Conference on Management of Data. 489--504.

Digital Library

[23]

Scott T Leutenegger, Mario A Lopez, and Jeffrey Edgington. 1997. STR: A simple and efficient algorithm for R-tree packing. In Proceedings 13th International Conference on Data Engineering. IEEE, 497--506.

[24]

Pengfei Li, Hua Lu, Qian Zheng, Long Yang, and Gang Pan. 2020. LISA: A learned index structure for spatial data. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 2119--2133.

Digital Library

[25]

Xi Liang, Aaron J Elmore, and Sanjay Krishnan. 2019. Opportunistic view materialization with deep reinforcement learning. arXiv preprint arXiv:1903.01363 (2019).

[26]

Volodymyr Mnih, Koray Kavukcuoglu, David Silver, Andrei A Rusu, Joel Veness, Marc G Bellemare, Alex Graves, Martin Riedmiller, Andreas K Fidjeland, Georg Ostrovski, et al. 2015. Human-level control through deep reinforcement learning. Nature, Vol. 518, 7540 (2015), 529--533.

[27]

Vikram Nathan, Jialin Ding, Mohammad Alizadeh, and Tim Kraska. 2020. Learning multi-dimensional indexes. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 985--1000.

Digital Library

[28]

Jack A Orenstein. 1986. Spatial query processing in an object-oriented database system. In Proceedings of the 1986 ACM SIGMOD international conference on Management of data. 326--336.

Digital Library

[29]

Varun Pandey, Alexander van Renen, Andreas Kipf, Ibrahim Sabek, Jialin Ding, and Alfons Kemper. 2020. The case for learned spatial indexes. arXiv preprint arXiv:2008.10349 (2020).

[30]

Mirjana Pavlovic, Darius Sidlauskas, Thomas Heinis, and Anastasia Ailamaki. 2018. QUASII: query-aware spatial incremental index. In 21st International Conference on Extending Database Technology (EDBT).

[31]

Martin L Puterman. 2014. Markov decision processes: discrete stochastic dynamic programming. John Wiley & Sons.

[32]

Jianzhong Qi, Guanli Liu, Christian S Jensen, and Lars Kulik. 2020. Effectively learning spatial indices. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2341--2354.

Digital Library

[33]

Jianzhong Qi, Yufei Tao, Yanchuan Chang, and Rui Zhang. 2018. Theoretically optimal and empirically efficient r-trees with strong parallelizability. Proceedings of the VLDB Endowment, Vol. 11, 5 (2018), 621--634.

Digital Library

[34]

Kenneth A Ross, Inga Sitzmann, and Peter J Stuckey. 2001. Cost-based unbalanced R-trees. In Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001. IEEE, 203--212.

[35]

Nick Roussopoulos, Stephen Kelley, and Frédéric Vincent. 1995. Nearest neighbor queries. In Proceedings of the 1995 ACM SIGMOD international conference on Management of data. 71--79.

Digital Library

[36]

Nick Roussopoulos and Daniel Leifker. 1985. Direct spatial search on pictorial databases using packed R-trees. In Proceedings of the 1985 ACM SIGMOD international conference on Management of data. 17--31.

Digital Library

[37]

Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online index selection using deep reinforcement learning for a cluster database. In 2020 IEEE 36th International Conference on Data Engineering Workshops (ICDEW). IEEE, 158--161.

[38]

Timos Sellis, Nick Roussopoulos, and Christos Faloutsos. 1987. The R-Tree: A Dynamic Index for Multi-Dimensional Objects. Technical Report.

[39]

Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. Skinnerdb: Regret-bounded query evaluation via reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 1153--1170.

Digital Library

[40]

Haixin Wang, Xiaoyi Fu, Jianliang Xu, and Hua Lu. 2019. Learned Index for Spatial Queries. In 2019 20th IEEE International Conference on Mobile Data Management (MDM). IEEE, 569--574.

[41]

Zheng Wang, Cheng Long, Gao Cong, and Yiding Liu. 2020. Efficient and effective similar subtrajectory search with deep reinforcement learning. Proceedings of the VLDB Endowment, Vol. 13, 12 (2020), 2312--2325.

Digital Library

[42]

Jiacheng Wu, Yong Zhang, Shimin Chen, Jin Wang, Yu Chen, and Chunxiao Xing. 2021. Updatable Learned Index with Precise Positions. arXiv preprint arXiv:2104.05520 (2021).

[43]

Zongheng Yang, Badrish Chandramouli, Chi Wang, Johannes Gehrke, Yinan Li, Umar Farooq Minhas, Per-Åke Larson, Donald Kossmann, and Rajeev Acharya. 2020. Qd-tree: Learning data layouts for big data analytics. In Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data. 193--208.

Digital Library

[44]

Xiang Yu, Guoliang Li, Chengliang Chai, and Nan Tang. 2020. Reinforcement learning with tree-lstm for join order selection. In 2020 IEEE 36th International Conference on Data Engineering (ICDE). IEEE, 1297--1308.

[45]

Eleni Tzirita Zacharatou, Andreas Kipf, Ibrahim Sabek, Varun Pandey, Harish Doraiswamy, and Volker Markl. 2020. The case for distance-bounded spatial approximations. arXiv preprint arXiv:2010.12548 (2020).

[46]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, et al. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In Proceedings of the 2019 International Conference on Management of Data. 415--432.

Digital Library

Cited By

Kim JHong SJeong SPark SYu K(2024)SGIR-Tree: Integrating R-Tree Spatial Indexing as Subgraphs in Graph Database Management SystemsISPRS International Journal of Geo-Information10.3390/ijgi1310034613:10(346)Online publication date: 27-Sep-2024
https://doi.org/10.3390/ijgi13100346
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Michalopoulos ATsitsigkos DBouros PMamoulis NTerrovitis M(2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/3698194Online publication date: 2-Oct-2024
https://dl.acm.org/doi/10.1145/3698194
Show More Cited By

Index Terms

The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data
1. Information systems
  1. Data management systems

Recommendations

BT-Tree: A Reinforcement Learning Based Index for Big Trajectory Data
SIGMOD

With the increasing availability of trajectory data, it is important to have good indexes to facilitate query processing. In this work, we propose BT-Tree, which is built through a recursive bi-partitioning approach, for the processing of range and KNN ...
PLATON: Top-down R-tree Packing with Learned Partition Policy
PACMMOD

The exponential growth of spatial data poses new challenges to the performance of spatial databases. Spatial indexes like R-tree greatly accelerate the query performance and can be effectively constructed through packing, i.e., loading all data into the ...
SPRIG: A Learned Spatial Index for Range and kNN Queries
SSTD '21: Proceedings of the 17th International Symposium on Spatial and Temporal Databases

A corpus of recent work has revealed that the learned index can improve query performance while reducing the storage overhead. It potentially offers an opportunity to address the spatial query processing challenges caused by the surge in location-based ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 1, Issue 1

PACMMOD

May 2023

2807 pages

EISSN:2836-6573

DOI:10.1145/3603164

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2023

Published in PACMMOD Volume 1, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Funding Sources

Alibaba-NTU Singapore Joint Research Institute

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
1,424
Total Downloads

Downloads (Last 12 months)1,031
Downloads (Last 6 weeks)117

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Kim JHong SJeong SPark SYu K(2024)SGIR-Tree: Integrating R-Tree Spatial Indexing as Subgraphs in Graph Database Management SystemsISPRS International Journal of Geo-Information10.3390/ijgi1310034613:10(346)Online publication date: 27-Sep-2024
https://doi.org/10.3390/ijgi13100346
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://dl.acm.org/doi/10.14778/3681954.3682007
Michalopoulos ATsitsigkos DBouros PMamoulis NTerrovitis M(2024)Efficient Distance Queries on Non-point DataACM Transactions on Spatial Algorithms and Systems10.1145/3698194Online publication date: 2-Oct-2024
https://dl.acm.org/doi/10.1145/3698194
Yu WLuo SYu ZCong G(2024)CAMAL: Optimizing LSM-trees via Active LearningProceedings of the ACM on Management of Data10.1145/36771382:4(1-26)Online publication date: 30-Sep-2024
https://doi.org/10.1145/3677138
Gu TFeng KYang JCong GLong CZhang R(2024)BT-Tree: A Reinforcement Learning Based Index for Big Trajectory DataProceedings of the ACM on Management of Data10.1145/36771302:4(1-27)Online publication date: 30-Sep-2024
https://doi.org/10.1145/3677130
Fan JTu JLi GWang PDu XJia XGao STang N(2024)Unicorn: A Unified Multi-Tasking Matching ModelACM SIGMOD Record10.1145/3665252.366526353:1(44-53)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3665252.3665263
Döhmen TGeacu RHulsebos MSchelter S(2024)SchemaPile: A Large Collection of Relational Database SchemasProceedings of the ACM on Management of Data10.1145/36549752:3(1-25)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654975
Le VZhang H(2024)PreLog: A Pre-trained Model for Log AnalyticsProceedings of the ACM on Management of Data10.1145/36549662:3(1-28)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654966
Wang JYang DHu BShen YZhang WGu JBaeza-Yates RBonchi F(2024)Know Your Needs Better: Towards Structured Understanding of Marketer Demands with Analogical Reasoning Augmented LLMsProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining10.1145/3637528.3671583(5860-5871)Online publication date: 25-Aug-2024
https://dl.acm.org/doi/10.1145/3637528.3671583
Cong GYang JZhao YBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Machine Learning for Databases: Foundations, Paradigms, and Open problemsCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654686(622-629)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654686
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents