research-article

AI Meets AI: Leveraging Query Executions to Improve Index Recommendations

Authors:

Surajit Chaudhuri,

Vivek R. NarasayyaAuthors Info & Claims

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

Pages 1241 - 1258

https://doi.org/10.1145/3299869.3324957

Published: 25 June 2019 Publication History

Abstract

State-of-the-art index tuners rely on query optimizer's cost estimates to search for the index configuration with the largest estimated execution cost improvement`. Due to well-known limitations in optimizer's estimates, in a significant fraction of cases, an index estimated to improve a query's execution cost, e.g., CPU time, makes that worse when implemented. Such errors are a major impediment for automated indexing in production systems. We observe that comparing the execution cost of two plans of the same query corresponding to different index configurations is a key step during index tuning. Instead of using optimizer's estimates for such comparison, our key insight is that formulating it as a classification task in machine learning results in significantly higher accuracy. We present a study of the design space for this classification problem. We further show how to integrate this classifier into the state-of-the-art index tuners with minimal modifications, i.e., how artificial intelligence (AI) can benefit automated indexing (AI). Our evaluation using industry-standard benchmarks and a large number of real customer workloads demonstrates up to 5x reduction in the errors in identifying the cheaper plan in a pair, which eliminates almost all query execution cost regressions when the model is used in index tuning.

References

[1]

Martín Abadi et al. 2016. TensorFlow: A System for Large-Scale Machine Learning. In OSDI. 265--283. https://www.usenix.org/conference/ osdi16/technical-sessions/presentation/abadi

[2]

Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollár, Arunprasad P. Marathe, Vivek R. Narasayya, and Manoj Syamala. 2004. Database Tuning Advisor for Microsoft SQL Server 2005. In VLDB. 1110--1121.

[3]

Sanjay Agrawal, Eric Chu, and Vivek Narasayya. 2006. Automatic physical design tuning: workload as a sequence. In Proceedings of the 2006 ACM SIGMOD international conference on Management of data. ACM, 683--694.

Digital Library

[4]

Sanjay Agrawal, Vivek R. Narasayya, and Beverly Yang. 2004. Integrating Vertical and Horizontal Partitioning Into Automated Physical Database Design. In SIGMOD. 359--370.

Digital Library

[5]

Mert Akdere, Ugur Çetintemel, Matteo Riondato, Eli Upfal, and Stanley B. Zdonik. 2012. Learning-based Query Performance Modeling and Prediction. In ICDE. IEEE Computer Society, Washington, DC, USA, 390--401.

Digital Library

[6]

Ioannis Alagiannis, Stratos Idreos, and Anastasia Ailamaki. 2014. H2O: a hands-free adaptive store. In SIGMOD. 1103--1114.

[7]

Joy Arulraj, Andrew Pavlo, and Prashanth Menon. 2016. Bridging the Archipelago between Row-Stores and Column-Stores for Hybrid Workloads. In SIGMOD. 583--598.

[8]

Azure SQL Database {n. d.}. Azure SQL Database. https://azure. microsoft.com/en-us/services/sql-database/.

[9]

Christopher M. Bishop. 2007. Pattern recognition and machine learning, 5th Edition. Springer. http://www.worldcat.org/oclc/71008143

[10]

Renata Borovica, Ioannis Alagiannis, and Anastasia Ailamaki. 2012. Automated physical designers: what you see is (not) what you get. In DBTest. 9.

[11]

Kurt P. Brown, Michael J. Carey, and Miron Livny. 1996. Goal-oriented Buffer Management Revisited. In SIGMOD. ACM, New York, NY, USA, 353--364.

Digital Library

[12]

Nicolas Bruno and Surajit Chaudhuri. 2007. An Online Approach to Physical Design Tuning. In ICDE. 826--835.

[13]

Nicolas Bruno and Surajit Chaudhuri. 2007. Physical design refinement: The 'merge-reduce' approach. ACM Trans. Database Syst. 32, 4 (2007), 28.

Digital Library

[14]

Surajit Chaudhuri. 1998. An Overview of Query Optimization in Relational Systems. In PODS. ACM, New York, NY, USA, 34--43.

Digital Library

[15]

Surajit Chaudhuri. 2009. Query optimizers: time to rethink the contract?. In SIGMOD. 961--968.

Digital Library

[16]

Surajit Chaudhuri, Vivek Narasayya, and Ravi Ramamurthy. 2008. A pay-as-you-go framework for query execution feedback. PVLDB 1, 1 (2008), 1141--1152.

Digital Library

[17]

Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost- Driven Index Selection Tool for Microsoft SQL Server. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146--155.

Digital Library

[18]

Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378.

Digital Library

[19]

Surajit Chaudhuri and Vivek R. Narasayya. 2007. Self-Tuning Database Systems: A Decade of Progress. In VLDB. 3--14.

Digital Library

[20]

Chungmin Melvin Chen and Nick Roussopoulos. 1994. Adaptive Selectivity Estimation Using Query Feedback. In SIGMOD. ACM, New York, NY, USA, 161--172.

[21]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalable Tree Boosting System. In SIGKDD. 785--794.

Digital Library

[22]

Carlo Curino, Evan Jones, Yang Zhang, and Sam Madden. 2010. Schism: a workload-driven approach to database replication and partitioning. PVLDB 3, 1--2 (2010), 48--57.

Digital Library

[23]

Benoît Dageville, Dinesh Das, Karl Dias, Khaled Yagoub, Mohamed Zaït, and Mohamed Ziauddin. 2004. Automatic SQL Tuning in Oracle 10g. In VLDB. 1098--1109. http://www.vldb.org/conf/2004/IND4P2.PDF

Digital Library

[24]

Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. ACM.

[25]

Debabrata Dash, Neoklis Polyzotis, and Anastasia Ailamaki. 2011. CoPhy: A Scalable, Portable, and Interactive Index Advisor for Large Workloads. PVLDB 4, 6 (2011), 362--372.

Digital Library

[26]

Bailu Ding, Sudipto Das, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2018. Plan Stitch: Harnessing the Best of Many Plans. PVLDB 11, 10 (2018), 1123--1136.

Digital Library

[27]

Jennie Duggan, Ugur Çetintemel, Olga Papaemmanouil, and Eli Upfal. 2011. Performance prediction for concurrent database workloads. In SIGMOD. 337--348.

Digital Library

[28]

Adam Dziedzic, Jingjing Wang, Sudipto Das, Bolin Ding, Vivek R. Narasayya, and Manoj Syamala. 2018. Columnstore and B+ tree - Are Hybrid Physical Designs Important?. In SIGMOD. 177--190.

[29]

Kareem El Gebaly and Ashraf Aboulnaga. 2008. Robustness in automatic physical database design. In EDBT. 145--156.

Digital Library

[30]

Sheldon J. Finkelstein, Mario Schkolnick, and Paolo Tiberio. 1988. Physical Database Design for Relational Databases. ACM Trans. Database Syst. 13, 1 (1988), 91--128.

Digital Library

[31]

Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L. Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE. IEEE Computer Society, Washington, DC, USA, 592--603.

Digital Library

[32]

Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning. MIT Press. http://www.deeplearningbook.org.

Digital Library

[33]

Aditya Grover and Jure Leskovec. 2016. node2vec: Scalable Feature Learning for Networks. In SIGKDD. 855--864.

Digital Library

[34]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. In ICCV. 1026--1034.

Digital Library

[35]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In CVPR. 770--778.

[36]

Kurt Hornik. 1991. Approximation capabilities of multilayer feedforward networks. Neural Networks 4, 2 (1991), 251--257.

Digital Library

[37]

Kurt Hornik, Maxwell B. Stinchcombe, and Halbert White. 1989. Multilayer feedforward networks are universal approximators. Neural Networks 2, 5 (1989), 359--366.

[38]

Stratos Idreos, Martin L. Kersten, and Stefan Manegold. 2007. Database Cracking. In CIDR. 68--78.

[39]

Stratos Idreos, Stefan Manegold, and Goetz Graefe. 2012. Adaptive indexing in modern database kernels. In EDBT. 566--569.

[40]

Stratos Idreos, Stefan Manegold, Harumi A. Kuno, and Goetz Graefe. 2011. Merging What's Cracked, Cracking What's Merged: Adaptive Indexing in Main-Memory Column-Stores. PVLDB 4, 9 (2011), 585--597.

[41]

Stratos Idreos, Kostas Zoumpatianos, Brian Hentschel, Michael S. Kester, and Demi Guo. 2018. The Data Calculator: Data Structure Design and Cost Synthesis from First Principles and Learned Cost Models. In SIGMOD. 535--550.

[42]

Guolin Ke, Qi Meng, Thomas Finley, Taifeng Wang, Wei Chen, Weidong Ma, Qiwei Ye, and Tie-Yan Liu. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In NIPS. 3149--3157. http://papers.nips.cc/paper/ 6907-lightgbm-a-highly-efficient-gradient-boosting-decision-tree

[43]

Keras 2018. Keras: The Python Deep Learning library. https://keras.io/.

[44]

Michael S. Kester, Manos Athanassoulis, and Stratos Idreos. 2017. Access Path Selection in Main-Memory Optimized Data Systems: Should I Scan or Should I Probe?. In SIGMOD. 715--730.

Digital Library

[45]

Diederik P. Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. CoRR abs/1412.6980 (2014). arXiv:1412.6980 http://arxiv.org/abs/1412.6980

[46]

Tim Kraska, Alex Beutel, Ed H. Chi, Jeffrey Dean, and Neoklis Polyzotis. 2018. The Case for Learned Index Structures. In Proceedings of the 2018 International Conference on Management of Data, SIGMOD Conference 2018, Houston, TX, USA, June 10--15, 2018. 489--504.

Digital Library

[47]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB 9, 3 (Nov. 2015), 204--215.

Digital Library

[48]

Viktor Leis, Alfons Kemper, and Thomas Neumann. 2013. The adaptive radix tree: ARTful indexing for main-memory databases. In ICDE. 38-- 49.

[49]

Jiexing Li, Arnd Christian König, Vivek R. Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. PVLDB 5, 11 (2012), 1555-- 1566.

Digital Library

[50]

Guy Lohman. 2014. Is Query Optimization a "Solved" Problem? http: //wp.sigmod.org/?p=1075.

[51]

ML.NET 2018. Machine Learning for .NET. https://github.com/dotnet/ machinelearning.

[52]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C. Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self- Driving Database Management Systems. In CIDR.

[53]

Lorien Y. Pratt. 1992. Discriminability-Based Transfer between Neural Networks. In NIPS. 204--211. http://papers.nips.cc/paper/ 641-discriminability-based-transfer-between-neural-networks

Digital Library

[54]

Program for TPC-H Data Generation with Skew {n. d.}. Program for TPC-H Data Generation with Skew. https://www.microsoft.com/ en-us/download/details.aspx?id=52430.

[55]

Jun Rao, Chun Zhang, Nimrod Megiddo, and Guy M. Lohman. 2002. Automating physical database design in a parallel database. In SIGMOD. 558--569.

Digital Library

[56]

Karl Schnaitter, Serge Abiteboul, Tova Milo, and Neoklis Polyzotis. 2006. COLT: continuous on-line tuning. In SIGMOD. 793--795.

Digital Library

[57]

scikit-learn 2018. scikit-learn: Machine Learning in Python. http: //scikit-learn.org/stable/.

[58]

Ankur Sharma, Felix Martin Schuhknecht, and Jens Dittrich. 2018. The Case for Automatic Database Administration using Deep Reinforcement Learning. CoRR abs/1801.05643 (2018).

[59]

Rupesh Kumar Srivastava, Klaus Greff, and Jürgen Schmidhuber. 2015. Training Very Deep Networks. In NIPS. 2377--2385. http://papers.nips. cc/paper/5850-training-very-deep-networks

Digital Library

[60]

Michael Stillger, Guy M. Lohman, Volker Markl, and Mokhtar Kandil. 2001. LEO - DB2's LEarning Optimizer. In VLDB. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 19--28.

Digital Library

[61]

Michael Stonebraker. 1974. The choice of partial inversions and combined indices. International Journal of Parallel Programming 3, 2 (1974), 167--188.

[62]

Adam J Storm, Christian Garcia-Arellano, Sam S Lightstone, Yixin Diao, and Maheswaran Surendra. 2006. Adaptive self-tuning memory in DB2. In VLDB. VLDB Endowment, 1081--1092.

Digital Library

[63]

TPC Benchmark DS: Standard Specification v2.6.0. {n. d.}. TPC Benchmark DS: Standard Specification v2.6.0. http://www.tpc.org/tpcds/.

[64]

TPC Benchmark H: Standard Specification v2.17.3. {n. d.}. TPC Benchmark H: Standard Specification v2.17.3. http://www.tpc.org/tpch/ default.asp.

[65]

Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE. 101--110.

Digital Library

[66]

Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In SIGMOD. ACM, New York, NY, USA, 1009--1024.

Digital Library

[67]

Gerhard Weikum, Axel Moenkeberg, Christof Hasse, and Peter Zabback. 2002. Self-tuning database technology and information services: from wishful thinking to viable engineering. In VLDB. Elsevier, 20--31.

Digital Library

[68]

Wentao Wu, Yun Chi, Shenghuo Zhu, Jun'ichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting Query Execution Time: Are Optimizer Cost Models Really Unusable?. In ICDE. IEEE Computer Society, Washington, DC, USA, 1081--1092.

Digital Library

[69]

Wentao Wu, Jeffrey F. Naughton, and Harneet Singh. 2016. Sampling- Based Query Re-Optimization. In SIGMOD. ACM, New York, NY, USA, 1721--1736.

Digital Library

[70]

Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How transferable are features in deep neural networks?. In NIPS. 3320--3328. http://papers.nips.cc/paper/ 5347-how-transferable-are-features-in-deep-neural-networks

Digital Library

[71]

Haijun Zhang, Shuang Wang, Xiaofei Xu, Tommy W. S. Chow, and Q. M. Jonathan Wu. 2018. Tree2Vector: Learning a Vectorial Representation for Tree-Structured Data. IEEE Trans. Neural Netw. Learning Syst. 29, 11 (2018), 5304--5318.

[72]

Daniel C. Zilio, Jun Rao, Sam Lightstone, Guy Lohman, Adam Storm, Christian Garcia-Arellano, and Scott Fadden. 2004. DB2 Design Advisor: Integrated Automatic Physical Database Design. In VLDB. VLDB Endowment, 1087--1097.

Digital Library

Cited By

Wu W(2025)Hybrid Cost Modeling for Reducing Query Performance Regression in Index TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348495437:1(379-391)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484954
Zakrzewicz M(2024)Learning From User-Specified Optimizer Hints in Database SystemsFoundations of Computing and Decision Sciences10.2478/fcds-2024-001149:2(181-197)Online publication date: 26-May-2024
https://doi.org/10.2478/fcds-2024-0011
Giannakouris VTrummer I(2024)DBG-PT: A Large Language Model Assisted Query Performance Regression DebuggerProceedings of the VLDB Endowment10.14778/3685800.368586917:12(4337-4340)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685869
Show More Cited By

Index Terms

AI Meets AI: Leveraging Query Executions to Improve Index Recommendations
1. Computer systems organization
  1. Architectures
    1. Distributed architectures
      1. Cloud computing
2. Information systems
  1. Data management systems
    1. Data structures
      1. Data layout
    2. Database administration
      1. Autonomous database administration

Recommendations

Automatically Indexing Millions of Databases in Microsoft Azure SQL Database
SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

An appropriate set of indexes can result in orders of magnitude better query performance. Index management is a challenging task even for expert human administrators. Fully automating this process is of significant value. We describe the challenges, ...
An index selection method without repeated optimizer estimations

The index selection problem (ISP) concerns the selection of an appropriate index set to minimize the total cost for a given workload containing read and update queries. Since the ISP has been proven to be an NP-hard problem, most studies focus on ...
Learning an Index Advisor with Deep Reinforcement Learning
Web and Big Data
Abstract
Indexes are crucial for the efficient processing of database workloads and an appropriately selected set of indexes can drastically improve query processing performance. However, the selection of beneficial indexes is a non-trivial problem and ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '19: Proceedings of the 2019 International Conference on Management of Data

June 2019

2106 pages

ISBN:9781450356435

DOI:10.1145/3299869

General Chairs:
Peter Boncz
CWI & Vrije Universiteit Amsterdam, The Netherlands
,
Stefan Manegold
CWI & Universiteit Leiden, The Netherlands
,
Program Chairs:
Anastasia Ailamaki
EPFL, Switzerland
,
Amol Deshpande
University of Maryland, USA
,
Tim Kraska
MIT, USA

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 25 June 2019

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

SIGMOD/PODS '19

Sponsor:

SIGMOD

SIGMOD/PODS '19: International Conference on Management of Data

June 30 - July 5, 2019

Amsterdam, Netherlands

Acceptance Rates

SIGMOD '19 Paper Acceptance Rate 88 of 430 submissions, 20%;

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

100
Total Citations
View Citations
1,265
Total Downloads

Downloads (Last 12 months)204
Downloads (Last 6 weeks)14

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Wu W(2025)Hybrid Cost Modeling for Reducing Query Performance Regression in Index TuningIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2024.348495437:1(379-391)Online publication date: Jan-2025
https://doi.org/10.1109/TKDE.2024.3484954
Zakrzewicz M(2024)Learning From User-Specified Optimizer Hints in Database SystemsFoundations of Computing and Decision Sciences10.2478/fcds-2024-001149:2(181-197)Online publication date: 26-May-2024
https://doi.org/10.2478/fcds-2024-0011
Giannakouris VTrummer I(2024)DBG-PT: A Large Language Model Assisted Query Performance Regression DebuggerProceedings of the VLDB Endowment10.14778/3685800.368586917:12(4337-4340)Online publication date: 8-Nov-2024
https://dl.acm.org/doi/10.14778/3685800.3685869
Zhang WLim WButrovich MPavlo A(2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
https://doi.org/10.14778/3681954.3682007
Yu TZou ZSun WYan Y(2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 1-Mar-2024
https://dl.acm.org/doi/10.14778/3654621.3654622
Weng LZhu RWu DDing BZheng BZhou J(2024)Eraser: Eliminating Performance Regression on Learned Query OptimizerProceedings of the VLDB Endowment10.14778/3641204.364120517:5(926-938)Online publication date: 1-Jan-2024
https://dl.acm.org/doi/10.14778/3641204.3641205
Zhao YLi ZCong G(2024)A Comparative Study and Component Analysis of Query Plan Representation Techniques in ML4DB StudiesProceedings of the VLDB Endowment10.14778/3636218.363623517:4(823-835)Online publication date: 5-Mar-2024
https://doi.org/10.14778/3636218.3636235
Gong QChen HCao HLiu J(2024)Backbone Index and GNN Models for Skyline Path Query Evaluation over Multi-cost Road NetworksACM Transactions on Spatial Algorithms and Systems10.1145/366063210:4(1-45)Online publication date: 23-Apr-2024
https://dl.acm.org/doi/10.1145/3660632
Wang XWu WWang CNarasayya VChaudhuri S(2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654985
Siddiqui TWu W(2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
https://dl.acm.org/doi/10.1145/3641832.3641836
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten