Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3299869.3324957acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

AI Meets AI: Leveraging Query Executions to Improve Index Recommendations

Published: 25 June 2019 Publication History

Abstract

State-of-the-art index tuners rely on query optimizer's cost estimates to search for the index configuration with the largest estimated execution cost improvement`. Due to well-known limitations in optimizer's estimates, in a significant fraction of cases, an index estimated to improve a query's execution cost, e.g., CPU time, makes that worse when implemented. Such errors are a major impediment for automated indexing in production systems. We observe that comparing the execution cost of two plans of the same query corresponding to different index configurations is a key step during index tuning. Instead of using optimizer's estimates for such comparison, our key insight is that formulating it as a classification task in machine learning results in significantly higher accuracy. We present a study of the design space for this classification problem. We further show how to integrate this classifier into the state-of-the-art index tuners with minimal modifications, i.e., how artificial intelligence (AI) can benefit automated indexing (AI). Our evaluation using industry-standard benchmarks and a large number of real customer workloads demonstrates up to 5x reduction in the errors in identifying the cheaper plan in a pair, which eliminates almost all query execution cost regressions when the model is used in index tuning.

References

[1]
Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/ osdi16/technical-sessions/presentation/abadi
[2]
Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollár, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. 2004. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB. 1110--1121.
[3]
Sanjay Agrawal, Eric Chu, and Vivek Narasayya. 2006. Automatic physical design tuning: workload as a sequence. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 683--694.
[4]
Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD. 359--370.
[5]
Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. IEEE Computer Society, Washington, DC, USA, 390--401.
[6]
Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In SIGMOD. 1103--1114.
[7]
Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD. 583--598.
[8]
Azure SQL Database {n. d.}. Azure SQL Database. https://azure. microsoft.com/en-us/services/sql-database/.
[9]
Christopher M. Bishop. 2007. Pattern recognition and machine learning, 5th Edition. Springer. http://www.worldcat.org/oclc/71008143
[10]
Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In DBTest. 9.
[11]
Kurt P. Brown, Michael J. Carey, and Miron Livny. 1996. Goal-oriented Buffer Management Revisited. In SIGMOD. ACM, New York, NY, USA, 353--364.
[12]
Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. In ICDE. 826--835.
[13]
Nicolas Bruno and Surajit Chaudhuri. 2007. Physical design refinement: The 'merge-reduce' approach. ACM Trans. Database Syst. 32, 4 (2007), 28.
[14]
Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS. ACM, New York, NY, USA, 34--43.
[15]
Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract?. In SIGMOD. 961--968.
[16]
Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2008. A pay-as-you-go framework for query execution feedback. PVLDB 1, 1 (2008), 1141--1152.
[17]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost- Driven Index Selection Tool for Microsoft SQL Server. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155.
[18]
Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378.
[19]
Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In VLDB. 3--14.
[20]
Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive Selectivity Estimation Using Query Feedback. In SIGMOD. ACM, New York, NY, USA, 161--172.
[21]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794.
[22]
Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 1--2 (2010), 48--57.
[23]
Benoît Dageville, Dinesh Das, Karl Dias, Khaled Yagoub, Mohamed Zaït, and Mohamed Ziauddin. 2004. Automatic SQL Tuning in Oracle 10g. In VLDB. 1098--1109. http://www.vldb.org/conf/2004/IND4P2.PDF
[24]
Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. ACM.
[25]
Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. PVLDB 4, 6 (2011), 362--372.
[26]
Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. PVLDB 11, 10 (2018), 1123--1136.
[27]
Jennie Duggan, Ugur Çetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance prediction for concurrent database workloads. In SIGMOD. 337--348.
[28]
Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree - Are Hybrid Physical Designs Important?. In SIGMOD. 177--190.
[29]
Kareem El Gebaly and Ashraf Aboulnaga. 2008. Robustness in automatic physical database design. In EDBT. 145--156.
[30]
Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. 1988. Physical Database Design for Relational Databases. ACM Trans. Database Syst. 13, 1 (1988), 91--128.
[31]
Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE. IEEE Computer Society, Washington, DC, USA, 592--603.
[32]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.
[33]
Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. 855--864.
[34]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034.
[35]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.
[36]
Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257.
[37]
Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.
[38]
Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In CIDR. 68--78.
[39]
Stratos Idreos, Stefan Manegold, and Goetz Graefe. 2012. Adaptive indexing in modern database kernels. In EDBT. 566--569.
[40]
Stratos Idreos, Stefan Manegold, Harumi A. Kuno, and Goetz Graefe. 2011. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB 4, 9 (2011), 585--597.
[41]
Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, and Demi Guo. 2018. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In SIGMOD. 535--550.
[42]
Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS. 3149--3157. http://papers.nips.cc/paper/ 6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree
[43]
Keras 2018. Keras: The Python Deep Learning library. https://keras.io/.
[44]
Michael S. Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?. In SIGMOD. 715--730.
[45]
Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980
[46]
Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 489--504.
[47]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (Nov. 2015), 204--215.
[48]
Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38-- 49.
[49]
Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. PVLDB 5, 11 (2012), 1555-- 1566.
[50]
Guy Lohman. 2014. Is Query Optimization a "Solved" Problem? http: //wp.sigmod.org/?p=1075.
[51]
ML.NET 2018. Machine Learning for .NET. https://github.com/dotnet/ machinelearning.
[52]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self- Driving Database Management Systems. In CIDR.
[53]
Lorien Y. Pratt. 1992. Discriminability-Based Transfer between Neural Networks. In NIPS. 204--211. http://papers.nips.cc/paper/ 641-discriminability-based-transfer-between-neural-networks
[54]
Program for TPC-H Data Generation with Skew {n. d.}. Program for TPC-H Data Generation with Skew. https://www.microsoft.com/ en-us/download/details.aspx?id=52430.
[55]
Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In SIGMOD. 558--569.
[56]
Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. In SIGMOD. 793--795.
[57]
scikit-learn 2018. scikit-learn: Machine Learning in Python. http: //scikit-learn.org/stable/.
[58]
Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. CoRR abs/1801.05643 (2018).
[59]
Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training Very Deep Networks. In NIPS. 2377--2385. http://papers.nips. cc/paper/5850-training-very-deep-networks
[60]
Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 19--28.
[61]
Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 (1974), 167--188.
[62]
Adam J Storm, Christian Garcia-Arellano, Sam S Lightstone, Yixin Diao, and Maheswaran Surendra. 2006. Adaptive self-tuning memory in DB2. In VLDB. VLDB Endowment, 1081--1092.
[63]
TPC Benchmark DS: Standard Specification v2.6.0. {n. d.}. TPC Benchmark DS: Standard Specification v2.6.0. http://www.tpc.org/tpcds/.
[64]
TPC Benchmark H: Standard Specification v2.17.3. {n. d.}. TPC Benchmark H: Standard Specification v2.17.3. http://www.tpc.org/tpch/ default.asp.
[65]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE. 101--110.
[66]
Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD. ACM, New York, NY, USA, 1009--1024.
[67]
Gerhard Weikum, Axel Moenkeberg, Christof Hasse, and Peter Zabback. 2002. Self-tuning database technology and information services: from wishful thinking to viable engineering. In VLDB. Elsevier, 20--31.
[68]
Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In ICDE. IEEE Computer Society, Washington, DC, USA, 1081--1092.
[69]
Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling- Based Query Re-Optimization. In SIGMOD. ACM, New York, NY, USA, 1721--1736.
[70]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NIPS. 3320--3328. http://papers.nips.cc/paper/ 5347-how-transferable-are-features-in-deep-neural-networks
[71]
Haijun Zhang, Shuang Wang, Xiaofei Xu, Tommy W. S. Chow, and Q. M. Jonathan Wu. 2018. Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data. IEEE Trans. Neural Netw. Learning Syst. 29, 11 (2018), 5304--5318.
[72]
Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. VLDB Endowment, 1087--1097.

Cited By

View all
  • (2025)Hybrid Cost Modeling for Reducing Query Performance Regression in Index TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348495437:1(379-391)Online publication date: Jan-2025
  • (2024)Learning From User-Specified Optimizer Hints in Database SystemsFoundations of Computing and Decision Sciences10.2478/fcds-2024-001149:2(181-197)Online publication date: 26-May-2024
  • (2024)DBG-PT: A Large Language Model Assisted Query Performance Regression DebuggerProceedings of the VLDB Endowment10.14778/3685800.368586917:12(4337-4340)Online publication date: 8-Nov-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data
June 2019
2106 pages
ISBN:9781450356435
DOI:10.1145/3299869
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automated indexing
  2. autonomous database management
  3. performance tuning
  4. relational database-as-a-service

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '19
Sponsor:
SIGMOD/PODS '19: International Conference on Management of Data
June 30 - July 5, 2019
Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)204
  • Downloads (Last 6 weeks)14
Reflects downloads up to 08 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2025)Hybrid Cost Modeling for Reducing Query Performance Regression in Index TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348495437:1(379-391)Online publication date: Jan-2025
  • (2024)Learning From User-Specified Optimizer Hints in Database SystemsFoundations of Computing and Decision Sciences10.2478/fcds-2024-001149:2(181-197)Online publication date: 26-May-2024
  • (2024)DBG-PT: A Large Language Model Assisted Query Performance Regression DebuggerProceedings of the VLDB Endowment10.14778/3685800.368586917:12(4337-4340)Online publication date: 8-Nov-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 1-Mar-2024
  • (2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
  • (2024)A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB StudiesProceedings of the VLDB Endowment10.14778/3636218.363623517:4(823-835)Online publication date: 5-Mar-2024
  • (2024)Backbone Index and GNN Models for Skyline Path Query Evaluation over Multi-cost Road NetworksACM Transactions on Spatial Algorithms and Systems10.1145/366063210:4(1-45)Online publication date: 23-Apr-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media