research-article

Open access

Wred: Workload Reduction for Scalable Index Tuning

Authors:

Matteo Brucato,

Tarique Siddiqui,

Vivek Narasayya,

Surajit ChaudhuriAuthors Info & Claims

Proceedings of the ACM on Management of Data, Volume 2, Issue 1

Article No.: 50, Pages 1 - 26

https://doi.org/10.1145/3639305

Published: 26 March 2024 Publication History

Abstract

Modern database systems offer index-tuning advisors that automatically identify a set of indexes to improve workload performance. Advisors leverage the optimizer's what-if API to optimize a query for a hypothetical index configuration. Because what-if calls constitute a major bottleneck of index tuning, existing techniques, such as workload compression, help reduce the number of what-if calls to speed up tuning. Unfortunately, even with small workloads and few what-if calls, tuning can still take hours due to the complexity of the queries (e.g., the number of joins, filters, group-by and order-by clauses), which increases their optimization time. This paper introduces workload reduction, a new complementary technique aimed at expediting index tuning by decreasing individual what-if call time without significantly affecting the quality of index tuning. We present an efficient workload reduction algorithm, called Wred, which rewrites each query in the original workload to eliminate column and table expressions unlikely to benefit from indexes, thereby accelerating what-if calls. We study its complexity and ability to maintain high index quality. We perform an extensive evaluation over industry benchmarks and real-world customer workloads, which shows that Wred results in a 3x median speedup in tuning efficiency over an industrial-strength state-of-the-art index advisor, with only a 3.7% median loss in improvement---where improvement is the total workload cost as estimated by the query optimizer---and results in up to 24.7x speedup with 1.8% improvement loss. Furthermore, combining Wred and Isum (a state-of-the-art workload compression technique for index tuning) results in higher speedups than either of the two techniques alone, with 10.5x median speedup and 5% median improvement loss.

References

[1]

Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 930--932.

Digital Library

[2]

Sanjay Agrawal, Surajit Chaudhuri, and Vivek R Narasayya. 2000. Automated selection of materialized views and indexes in SQL databases. In VLDB, Vol. 2000. 496--505.

[3]

Matteo Brucato, Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. 2024. Wred: Workload Reduction for Scalable Index Tuning (extended version). https://matteo-brucato.github.io/files/wred_extended.

[4]

Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In SIGMOD. 227--238.

[5]

Nicolas Bruno and Surajit Chaudhuri. 2006. To tune or not to tune? A Lightweight Physical Design Alerter. In Proceedings of the 32nd international conference on Very large data bases. Citeseer, 499--510.

[6]

Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y Vardi. 2000. What is query rewriting?. In Cooperative Information Agents IV-The Future of Information Agents in Cyberspace: 4th International Workshop, CIA 2000, Boston, MA, USA, July 7--9, 2000. Proceedings 4. Springer, 51--59.

[7]

Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek Narasayya. 2002. Compressing SQL workloads. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 488--499.

Digital Library

[8]

Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek R. Narasayya. 2002. Compressing SQL workloads. In SIGMOD. 488--499.

[9]

Surajit Chaudhuri and Vivek Narasayya. 2020. Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server. (June 2020). https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/

[10]

Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB 1997. 146--155. http://www.vldb.org/conf/1997/P146.PDF

Digital Library

[11]

Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378.

[12]

Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey Naughton, and Stratis Viglas. 2020. Comprehensive and efficient workload compression. arXiv preprint arXiv:2011.05549 (2020).

[13]

Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey F. Naughton, and Stratis Viglas. 2020. Comprehensive and Efficient Workload Compression. Proc. VLDB Endow. 14, 3 (2020), 418--430.

Digital Library

[14]

Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In SIGMOD. 1241--1258.

Digital Library

[15]

Anna Fariha, Sheikh Muhammad Sarwar, and Alexandra Meliou. 2018. SQuID: Semantic similarity-aware query intent discovery. In Proceedings of the 2018 International Conference on Management of Data. 1745--1748.

Digital Library

[16]

Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R Woodward, John Drake, and Qiaofu Zhang. 2021. Natural SQL: Making SQL easier to infer from natural language specifications. arXiv preprint arXiv:2109.05153 (2021).

[17]

Antara Ghosh, Jignashu Parikh, Vibhuti S Sengar, and Jayant R Haritsa. 2002. Plan selection based on query clustering. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 179--190.

[18]

Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. Proc. VLDB Endow. 15, 4 (2021), 752--765. https://doi.org/10.14778/3503585.3503586

Digital Library

[19]

Shrainik Jain, Bill Howe, Jiaqi Yan, and Thierry Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).

[20]

Andrew Kane. 2017. The automatic indexer for Postgres. https://github.com/ankane/dexter.

[21]

Andrew Kane. 2017. Introducing Dexter, the Automatic Indexer for Postgres. (June 2017). https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27 last accessed November 2022.

[22]

Andrew Kane. 2017. Introducing Dexter, the Automatic Indexer for Postgres. https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27.

[23]

Richard M Karp. 1972. Reducibility among combinatorial problems, Complexity of computer computations (RE Miller and JW Thatcher, editors).

[24]

Richard M Karp. 2010. Reducibility among combinatorial problems. Springer.

[25]

Hyeonji Kim, Byeong-Hoon So, Wook-Shin Han, and Hongrae Lee. 2020. Natural language to SQL: Where are we today? Proceedings of the VLDB Endowment 13, 10 (2020), 1737--1750.

Digital Library

[26]

Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic Mirror in My Hand, Which is The Best in The Land? An Experimental Evaluation of Index Selection Algorithms. VLDB 13, 12 (2020), 2382--2395. http://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf

Digital Library

[27]

Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola, Oliver Kennedy, and Shambhu Upadhyaya. 2016. Ettu: Analyzing query intents in corporate databases. In Proceedings of the 25th international conference companion on world wide web. 463--466.

Digital Library

[28]

Hao Li, Chee-Yong Chan, and David Maier. 2015. Query from examples: An iterative, data-driven approach to query construction. Proceedings of the VLDB Endowment 8, 13 (2015), 2158--2169.

Digital Library

[29]

Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries Using Statistical Techniques. Proc. VLDB Endow. 5, 11 (jul 2012), 1555--1566. https://doi.org/10.14778/2350229.2350269

Digital Library

[30]

Stratos Papadomanolakis, Debabrata Dash, and Anastasia Ailamaki. 2007. Efficient use of the query optimizer for automated physical design. In Proceedings of the 33rd international conference on Very large data bases. 1093--1104.

[31]

Yannis Papakonstantinou and Vasilis Vassalos. 1999. Query rewriting for semistructured data. ACM SIGMOD Record 28, 2 (1999), 455--466.

Digital Library

[32]

Debjyoti Paul, Jie Cao, Feifei Li, and Vivek Srikumar. 2021. Database Workload Characterization with Query Plan Encoders. Proc. VLDB Endow. 15, 4 (dec 2021), 923--935. https://doi.org/10.14778/3503585.3503600

Digital Library

[33]

Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops. 166--171. https://doi.org/10.1109/ICDCSW.2011.20

Digital Library

[34]

Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2022. ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning. In SIGMOD 2022. 660--673. https://doi.org/10.1145/3514221.3526152

Digital Library

[35]

Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. 2022. DISTILL: low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning. Proceedings of the VLDB Endowment 15, 10 (2022), 2019--2031.

Digital Library

[36]

The TPC Benchmark?DS 2023. http://www.tpc.org/tpcds/.

[37]

The TPC Benchmark?H 2023. http://www.tpc.org/tpch/.

[38]

G. Valentin, M. Zuliani, D.C. Zilio, G. Lohman, and A. Skelley. 2000. DB2 advisor: an optimizer smart enough to recommend its own indexes. In Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073). 101--110. https://doi.org/10.1109/ICDE.2000.839397

[39]

Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, et al. 2023. Stitcher: Learned Workload Synthesis from Historical Performance Footprints. (2023).

[40]

Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive query synthesis from input-output examples. In Proceedings of the 2017 ACM International Conference on Management of Data. 1631--1634.

Digital Library

[41]

Wentao Wu, Yun Chi, Hakan Hacígümüs, and Jeffrey F. Naughton. 2013. Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads. Proc. VLDB Endow. 6, 10 (aug 2013), 925--936. https://doi.org/10.14778/2536206.2536219

Digital Library

[42]

Wentao Wu, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1081--1092. https://doi.org/10.1109/ICDE.2013.6544899

Digital Library

[43]

Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek Narasayya, Surajit Chaudhuri, and Philip A. Bernstein. 2022. Budget-Aware Index Tuning with Reinforcement Learning. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1528--1541. https://doi.org/10.1145/3514221.3526128

Digital Library

[44]

Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: query synthesis from natural language. Proc. ACM Program. Lang. 1, OOPSLA (2017), 63:1--63:26. https://doi.org/10.1145/3133887

Digital Library

[45]

Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A Learned Query Rewrite System Using Monte Carlo Tree Search. Proc. VLDB Endow. 15, 1 (sep 2021), 46--58. https://doi.org/10.14778/3485450.3485456

Digital Library

Cited By

Lim WMa LZhang WButrovich MArch SPavlo A(2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682030
Wang XWu WWang CNarasayya VChaudhuri S(2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654985

Index Terms

Wred: Workload Reduction for Scalable Index Tuning
1. Information systems
  1. Data management systems
    1. Database administration
      1. Autonomous database administration
    2. Database management system engines
      1. Database query processing
        Query operators

Recommendations

Wii: Dynamic Budget Reallocation In Index Tuning
SIGMOD

Index tuning aims to find the optimal index configuration for an input workload. It is often a time-consuming and resource-intensive process, largely attributed to the huge amount of "what-if" calls made to the query optimizer during configuration ...
Budget-aware Index Tuning with Reinforcement Learning
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Index tuning aims to find the optimal index configuration for an input workload. It is a resource-intensive task since it requires making multiple expensive "what-if" calls to the query optimizer to estimate the cost of a query given an index ...
ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Today's database systems include index advisors that recommend an appropriate set of indexes for an input workload. Since index tuning on large and complex workloads can be resource-intensive and time-consuming, workload compression techniques have been ...

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data

Proceedings of the ACM on Management of Data Volume 2, Issue 1

SIGMOD

February 2024

1874 pages

EISSN:2836-6573

DOI:10.1145/3654807

Editor:
Divyakant Agrawal
UC Santa Barbara, United States

Issue’s Table of Contents

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024

Published in PACMMOD Volume 2, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
239
Total Downloads

Downloads (Last 12 months)239
Downloads (Last 6 weeks)53

Reflects downloads up to 30 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Lim WMa LZhang WButrovich MArch SPavlo A(2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.14778/3681954.3682030
Wang XWu WWang CNarasayya VChaudhuri S(2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654985

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents