Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3477314.3507691acmconferencesArticle/Chapter ViewAbstractPublication PagessacConference Proceedingsconference-collections
research-article
Open access

Indexer++: workload-aware online index tuning with transformers and reinforcement learning

Published: 06 May 2022 Publication History

Abstract

With the increasing workload complexity in modern databases, the manual process of index selection is a challenging task. There is a growing need for a database with an ability to learn and adapt to evolving workloads. This paper proposes Indexer++, an autonomous, workload-aware, online index tuner. Unlike existing approaches, Indexer++ imposes low overhead on the DBMS, is responsive to changes in query workloads and swiftly selects indexes. Our approach uses a combination of text analytic techniques and reinforcement learning. Indexer++ consist of two phases: Phase (i) learns workload trends using a novel trend detection technique based on a pre-trained transformer model. Phase (ii) performs online, i.e., continuous or while the DBMS is processing workloads, index selection using a novel online deep reinforcement learning technique using our proposed priority experience sweeping. This paper provides an experimental evaluation of Indexer++ in multiple scenarios using benchmark (TPC-H) and real-world datasets (IMDB). In our experiments, Indexer++ effectively identifies changes in workload trends and selects the set of optimal indexes.

References

[1]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. Proceedings of the 2017 ACM International Conference on Management of Data (2017).
[2]
Bortik Bandyopadhyay, Pranav Maneriker, Vedang Patel, Saumya Yashmohini Sahai, Ping Zhang, and Srinivasan Parthasarathy. 2020. DrugDBEmbed : Semantic Queries on Relational Database using Supervised Column Encodings. ArXiv abs/2007.02384 (2020).
[3]
Debabrota Basu, Qian Lin, Weidong Chen, Hoang Tam Vo, Zihong Yuan, Pierre Senellart, and Stéphane Bressan. 2015. Cost-Model Oblivious Database Tuning with Reinforcement Learning. In DEXA. 253--268.
[4]
Yoshua Bengio, Aaron Courville, and Pascal Vincent. 2013. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence 35, 8 (2013), 1798--1828.
[5]
Rajesh R. Bordawekar, Bortik Bandyopadhyay, and Oded Shmueli. 2017. Cognitive Database: A Step towards Endowing Relational Databases with Artificial Intelligence Capabilities. ArXiv abs/1712.07199 (2017).
[6]
Rajesh R. Bordawekar and Oded Shmueli. 2017. Using Word Embedding to Enable Semantic Queries in Relational Databases. Proceedings of the 1st Workshop on Data Management for End-to-End Machine Learning (2017).
[7]
Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. 2007 IEEE 23rd International Conference on Data Engineering (2007), 826--835.
[8]
Riccardo Cappuzzo, Paolo Papotti, and Saravanan Thirumuruganathan. 2020. Creating Embeddings of Heterogeneous Relational Datasets for Data Integration Tasks. Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data (2020).
[9]
Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin "What-If" Index Analysis Utility. SIGMOD Rec. 27, 2, 367--378.
[10]
Surajit Chaudhuri and Vivek R Narasayya. 1997. An efficient, cost-driven index selection tool for Microsoft SQLserver. In VLDB, Vol. 97. Citeseer, 146--155.
[11]
Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. Proceedings of the 2019 International Conference on Management of Data (2019).
[12]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. Proceedings of the 2019 International Conference on Management of Data (2019).
[13]
Martin R. Frank, Edward Omiecinski, and Shamkant B. Navathe. 1992. Adaptive and Automated Index Selection in RDBMS. In EDBT '92. 277--292.
[14]
Michael Günther. 2018. FREDDY: Fast Word Embeddings in Database Systems. Proceedings of the 2018 International Conference on Management of Data (2018).
[15]
Michael Hammer and Arvola Chan. 1976. Index Selection in a Self-Adaptive Data Base Management System. In SIGMOD '76. 1--8.
[16]
Geoffrey E Hinton et al. 1986. Learning distributed representations of concepts. In Eighth annual conference of the cognitive science society, Vol. 1. 12.
[17]
Anil K Jain, M Narasimha Murty, and Patrick J Flynn. 1999. Data clustering: a review. ACM computing surveys (CSUR) 31, 3 (1999), 264--323.
[18]
Shrainik Jain, Bill Howe, Jiaqi Yan, and Thierry Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).
[19]
Herald Kllapi, Ilia Pietri, Verena Kantere, and Yannis E Ioannidis. 2020. Automated Management of Indexes for Dataflow Processing Engines in IaaS Clouds. In EDBT.
[20]
Piotr Kołaczkowski and Henryk Rybiński. 2009. Automatic index selection in RDBMS by exploring query execution plan space. In Advances in Data Management. Springer, 3--24.
[21]
Jan Kossmann and R. Schlosser. 2019. A Framework for Self-Managing Database Systems. 2019 IEEE 35th International Conference on Data Engineering Workshops (ICDEW) (2019), 100--106.
[22]
Tim Kraska, Mohammad Alizadeh, Alex Beutel, Ed H. Chi, Ani Kristo, Guillaume Leclerc, Samuel Madden, Hongzi Mao, and Vikram Nathan. 2019. SageDB: A Learned Database System. In CIDR.
[23]
Hai Lan, Zhifeng Bao, and Yuwei Peng. 2020. An Index Advisor Using Deep Reinforcement Learning. Proceedings of the 29th ACM International Conference on Information & Knowledge Management (2020).
[24]
Quoc V. Le and Tomas Mikolov. 2014. Distributed Representations of Sentences and Documents. ArXiv abs/1405.4053 (2014).
[25]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter A. Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? Proc. VLDB Endow. 9 (2015), 204--215.
[26]
Omer Levy and Yoav Goldberg. 2014. Linguistic Regularities in Sparse and Explicit Word Representations. In CoNLL.
[27]
Joseph Lilleberg, Yun Zhu, and Yanqing Zhang. 2015. Support vector machines and Word2vec for text classification with semantic features. 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC) (2015), 136--140.
[28]
Martin Luhring, Kai-Uwe Sattler, Karsten Schmidt, and Eike Schallehn. 2007. Autonomous Management of Soft Indexes. 2007 IEEE 23rd International Conference on Data Engineering Workshop (2007), 450--458.
[29]
Tomas Mikolov, Kai Chen, Gregory S. Corrado, and Jeffrey Dean. 2013. Efficient Estimation of Word Representations in Vector Space. In ICLR.
[30]
Mohammadreza Nazari, Afshin Oroojlooy, Lawrence Snyder, and Martin Takác. 2018. Reinforcement learning for solving the vehicle routing problem. In NIPS. 9839--9849.
[31]
Priscilla Neuhaus, Julia M. Colleoni Couto, Jonatas Wehrmann, Duncan Dubugras Alcoba Ruiz, and Felipe Meneguzzi. 2019. GADIS: A Genetic Algorithm for Database Index Selection (S). In SEKE.
[32]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, et al. 2017. Self-Driving Database Management Systems. In CIDR, Vol. 4. 1.
[33]
Wendel Góes Pedrozo, Júlio Cesar Nievola, and Deborah Carvalho Ribeiro. 2018. An Adaptive Approach for Index Tuning with Learning Classifier Systems on Hybrid Storage Environments. In Hybrid Artificial Intelligent Systems. 716--729.
[34]
Gregory Piatetsky-Shapiro. 1983. The optimal selection of secondary indices is NP-complete. SIGMOD Rec. 13 (1983), 72--75.
[35]
Zahra Sadri, Le Gruenwald, and Eleazar Leal. 2020. Online Index Selection Using Deep Reinforcement Learning for a Cluster Database. 2020 IEEE36th International Conference on Data Engineering Workshops (ICDEW) (2020), 158--161.
[36]
Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. Proceedings of the 2006 ACM SIGMOD international conference on Management of data (2006).
[37]
Karl Schnaitter, Neoklis Polyzotis, and Lise Getoor. 2009. Index Interactions in Physical Design Tuning: Modeling, Analysis, and Applications. Proc. VLDB Endow. 2, 1 (Aug. 2009), 1234--1245.
[38]
Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter Sentiment Analysis with Deep Convolutional Neural Networks. Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval (2015).
[39]
Ankur Kumar Sharma, Felix Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. ArXiv abs/1801.05643 (2018).
[40]
Vishal Sharma. 2021. Deep Learning Data and Indexes in a Database. (2021).
[41]
Vishal Sharma, Curtis E. Dyreson, and Nicholas S. Flann. 2021. MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. 25th International Database Engineering & Applications Symposium (2021).
[42]
Richard S. Sutton and Andrew G. Barto. 2005. Reinforcement Learning: An Introduction. IEEE Transactions on Neural Networks 16 (2005), 285--286.
[43]
Yi Chern Tan and L. Elisa Celis. 2019. Assessing Social and Intersectional Biases in Contextualized Word Representations. In NeurIPS.
[44]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 advisor: an optimizer smart enough to recommend its own indexes. Proceedings of 16th International Conference on Data Engineering (Cat.No.00CB37073) (2000), 101--110.
[45]
Hannes Voigt, Thomas Kissinger, and Wolfgang Lehner. 2013. SMIX: Self-Managing Indexes for Dynamic Workloads. In SSDBM. Article 24, 12 pages.
[46]
Ji Zhang, Ke Zhou, Guoliang Li, Yu Liu, Ming Xie, Bin Cheng, and Jiashu Xing. 2021. CDBTune+: An efficient deep reinforcement learning-based automatic cloud database tuning system. The VLDB Journal (2021).
[47]
Shengyu Zhu, Ignavier Ng, and Zhitang Chen. 2020. Causal Discovery with Reinforcement Learning. In ICLR.

Cited By

View all
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 30-May-2024
  • (2024)MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00331(4343-4356)Online publication date: 13-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SAC '22: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing
April 2022
2099 pages
ISBN:9781450387132
DOI:10.1145/3477314
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike International 4.0 License.

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 May 2022

Check for updates

Author Tags

  1. online index selection
  2. pre-trained transformers
  3. reinforcement learning
  4. workload trend detection

Qualifiers

  • Research-article

Funding Sources

Conference

SAC '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 1,650 of 6,669 submissions, 25%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)301
  • Downloads (Last 6 weeks)38
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 30-May-2024
  • (2024)MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00331(4343-4356)Online publication date: 13-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media