research-article

Open access

Indexer++: workload-aware online index tuning with transformers and reinforcement learning

Authors:

Curtis DyresonAuthors Info & Claims

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

Pages 372 - 380

https://doi.org/10.1145/3477314.3507691

Published: 06 May 2022 Publication History

Abstract

With the increasing workload complexity in modern databases, the manual process of index selection is a challenging task. There is a growing need for a database with an ability to learn and adapt to evolving workloads. This paper proposes Indexer++, an autonomous, workload-aware, online index tuner. Unlike existing approaches, Indexer++ imposes low overhead on the DBMS, is responsive to changes in query workloads and swiftly selects indexes. Our approach uses a combination of text analytic techniques and reinforcement learning. Indexer++ consist of two phases: Phase (i) learns workload trends using a novel trend detection technique based on a pre-trained transformer model. Phase (ii) performs online, i.e., continuous or while the DBMS is processing workloads, index selection using a novel online deep reinforcement learning technique using our proposed priority experience sweeping. This paper provides an experimental evaluation of Indexer++ in multiple scenarios using benchmark (TPC-H) and real-world datasets (IMDB). In our experiments, Indexer++ effectively identifies changes in workload trends and selects the set of optimal indexes.

References

[1]

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. Proceedings of the 2017 ACM International Conference on Management of Data (2017).

Digital Library

[2]

Bortik Bandyopadhyay, Pranav Maneriker, Vedang Patel, Saumya Yashmohini Sahai, Ping Zhang, and Srinivasan Parthasarathy. 2020. DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings. ArXiv abs/2007.02384 (2020).

[3]

Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2015. Cost-Model Oblivious Database Tuning with Reinforcement Learning. In DEXA. 253--268.

[4]

Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.

Digital Library

[5]

Rajesh R. Bordawekar, Bortik Bandyopadhyay, and Oded Shmueli. 2017. Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities. ArXiv abs/1712.07199 (2017).

[6]

Rajesh R. Bordawekar and Oded Shmueli. 2017. Using Word Embedding to Enable Semantic Queries in Relational Databases. Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning (2017).

[7]

Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. 2007 IEEE 23rd International Conference on Data Engineering (2007), 826--835.

[8]

Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020).

Digital Library

[9]

Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin "What-If" Index Analysis Utility. SIGMOD Rec. 27, 2, 367--378.

Digital Library

[10]

Surajit Chaudhuri and Vivek R Narasayya. 1997. An efficient, cost-driven index selection tool for Microsoft SQLserver. In VLDB, Vol. 97. Citeseer, 146--155.

[11]

Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. Proceedings of the 2019 International Conference on Management of Data (2019).

Digital Library

[12]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. Proceedings of the 2019 International Conference on Management of Data (2019).

Digital Library

[13]

Martin R. Frank, Edward Omiecinski, and Shamkant B. Navathe. 1992. Adaptive and Automated Index Selection in RDBMS. In EDBT '92. 277--292.

[14]

Michael Günther. 2018. FREDDY: Fast Word Embeddings in Database Systems. Proceedings of the 2018 International Conference on Management of Data (2018).

Digital Library

[15]

Michael Hammer and Arvola Chan. 1976. Index Selection in a Self-Adaptive Data Base Management System. In SIGMOD '76. 1--8.

Digital Library

[16]

Geoffrey E Hinton et al. 1986. Learning distributed representations of concepts. In Eighth annual conference of the cognitive science society, Vol. 1. 12.

[17]

Anil K Jain, M Narasimha Murty, and Patrick J Flynn. 1999. Data clustering: a review. ACM computing surveys (CSUR) 31, 3 (1999), 264--323.

[18]

Shrainik Jain, Bill Howe, Jiaqi Yan, and Thierry Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).

[19]

Herald Kllapi, Ilia Pietri, Verena Kantere, and Yannis E Ioannidis. 2020. Automated Management of Indexes for Dataflow Processing Engines in IaaS Clouds. In EDBT.

[20]

Piotr Kołaczkowski and Henryk Rybiński. 2009. Automatic index selection in RDBMS by exploring query execution plan space. In Advances in Data Management. Springer, 3--24.

[21]

Jan Kossmann and R. Schlosser. 2019. A Framework for Self-Managing Database Systems. 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW) (2019), 100--106.

[22]

Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In CIDR.

[23]

Hai Lan, Zhifeng Bao, and Yuwei Peng. 2020. An Index Advisor Using Deep Reinforcement Learning. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).

Digital Library

[24]

Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. ArXiv abs/1405.4053 (2014).

Digital Library

[25]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9 (2015), 204--215.

Digital Library

[26]

Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.

[27]

Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and Word2vec for text classification with semantic features. 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI^*CC) (2015), 136--140.

[28]

Martin Luhring, Kai-Uwe Sattler, Karsten Schmidt, and Eike Schallehn. 2007. Autonomous Management of Soft Indexes. 2007 IEEE 23rd International Conference on Data Engineering Workshop (2007), 450--458.

[29]

Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.

[30]

Mohammadreza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takác. 2018. Reinforcement learning for solving the vehicle routing problem. In NIPS. 9839--9849.

[31]

Priscilla Neuhaus, Julia M. Colleoni Couto, Jonatas Wehrmann, Duncan Dubugras Alcoba Ruiz, and Felipe Meneguzzi. 2019. GADIS: A Genetic Algorithm for Database Index Selection (S). In SEKE.

[32]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Vol. 4. 1.

[33]

Wendel Góes Pedrozo, Júlio Cesar Nievola, and Deborah Carvalho Ribeiro. 2018. An Adaptive Approach for Index Tuning with Learning Classifier Systems on Hybrid Storage Environments. In Hybrid Artificial Intelligent Systems. 716--729.

[34]

Gregory Piatetsky-Shapiro. 1983. The optimal selection of secondary indices is NP-complete. SIGMOD Rec. 13 (1983), 72--75.

Digital Library

[35]

Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online Index Selection Using Deep Reinforcement Learning for a Cluster Database. 2020 IEEE36th International Conference on Data Engineering Workshops (ICDEW) (2020), 158--161.

[36]

Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. Proceedings of the 2006 ACM SIGMOD international conference on Management of data (2006).

Digital Library

[37]

Karl Schnaitter, Neoklis Polyzotis, and Lise Getoor. 2009. Index Interactions in Physical Design Tuning: Modeling, Analysis, and Applications. Proc. VLDB Endow. 2, 1 (Aug. 2009), 1234--1245.

Digital Library

[38]

Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter Sentiment Analysis with Deep Convolutional Neural Networks. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015).

Digital Library

[39]

Ankur Kumar Sharma, Felix Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. ArXiv abs/1801.05643 (2018).

[40]

Vishal Sharma. 2021. Deep Learning Data and Indexes in a Database. (2021).

[41]

Vishal Sharma, Curtis E. Dyreson, and Nicholas S. Flann. 2021. MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. 25th International Database Engineering & Applications Symposium (2021).

Digital Library

[42]

Richard S. Sutton and Andrew G. Barto. 2005. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks 16 (2005), 285--286.

Digital Library

[43]

Yi Chern Tan and L. Elisa Celis. 2019. Assessing Social and Intersectional Biases in Contextualized Word Representations. In NeurIPS.

[44]

Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 advisor: an optimizer smart enough to recommend its own indexes. Proceedings of 16th International Conference on Data Engineering (Cat.No.00CB37073) (2000), 101--110.

[45]

Hannes Voigt, Thomas Kissinger, and Wolfgang Lehner. 2013. SMIX: Self-Managing Indexes for Dynamic Workloads. In SSDBM. Article 24, 12 pages.

Digital Library

[46]

Ji Zhang, Ke Zhou, Guoliang Li, Yu Liu, Ming Xie, Bin Cheng, and Jiashu Xing. 2021. CDBTune+: An efficient deep reinforcement learning-based automatic cloud database tuning system. The VLDB Journal (2021).

[47]

Shengyu Zhu, Ignavier Ng, and Zhitang Chen. 2020. Causal Discovery with Reinforcement Learning. In ICLR.

Cited By

Yu TZou ZSun WYan Y(2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654622
Chang ZZhang XLi YMiao XQin YCui B(2024)MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00331(4343-4356)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00331

Index Terms

Indexer++: workload-aware online index tuning with transformers and reinforcement learning
1. Information systems
  1. Data management systems

Recommendations

The RLR-Tree: A Reinforcement Learning Based R-Tree for Spatial Data
PACMMOD

Learned indexes have been proposed to replace classic index structures like B-Tree with machine learning (ML) models. They require to replace both the indexes and query processing algorithms currently deployed by the databases, and such a radical ...
Reward Shaping in Episodic Reinforcement Learning
AAMAS '17: Proceedings of the 16th Conference on Autonomous Agents and MultiAgent Systems

Recent advancements in reinforcement learning confirm that reinforcement learning techniques can solve large scale problems leading to high quality autonomous decision making. It is a matter of time until we will see large scale applications of ...
Empirical Gittins index strategies with ε-explorations for multi-armed bandit problems
Abstract
The machine learning/statistics literature has so far considered largely multi-armed bandit (MAB) problems in which the rewards from every arm are assumed independent and identically distributed. For more general MAB models in which ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing

April 2022

2099 pages

ISBN:9781450387132

DOI:10.1145/3477314

Conference Chairs:
Jiman Hong
Soongsil University
,
Miroslav Bures
Czech Technical University, Czechia
,
Program Chairs:
Juw Won Park
University of Louisville
,
Tomas Cerny
Baylor University

Copyright © 2022 Owner/Author.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

SIGAPP: ACM Special Interest Group on Applied Computing

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

National Science Foundation

Conference

SAC '22

Sponsor:

SIGAPP

SAC '22: The 37th ACM/SIGAPP Symposium on Applied Computing

April 25 - 29, 2022

Virtual Event

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

2
Total Citations
View Citations
902
Total Downloads

Downloads (Last 12 months)301
Downloads (Last 6 weeks)38

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yu TZou ZSun WYan Y(2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654622
Chang ZZhang XLi YMiao XQin YCui B(2024)MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00331(4343-4356)Online publication date: 13-May-2024
https://doi.org/10.1109/ICDE60146.2024.00331

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents