Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Wred: Workload Reduction for Scalable Index Tuning

Published: 26 March 2024 Publication History

Abstract

Modern database systems offer index-tuning advisors that automatically identify a set of indexes to improve workload performance. Advisors leverage the optimizer's what-if API to optimize a query for a hypothetical index configuration. Because what-if calls constitute a major bottleneck of index tuning, existing techniques, such as workload compression, help reduce the number of what-if calls to speed up tuning. Unfortunately, even with small workloads and few what-if calls, tuning can still take hours due to the complexity of the queries (e.g., the number of joins, filters, group-by and order-by clauses), which increases their optimization time. This paper introduces workload reduction, a new complementary technique aimed at expediting index tuning by decreasing individual what-if call time without significantly affecting the quality of index tuning. We present an efficient workload reduction algorithm, called Wred, which rewrites each query in the original workload to eliminate column and table expressions unlikely to benefit from indexes, thereby accelerating what-if calls. We study its complexity and ability to maintain high index quality. We perform an extensive evaluation over industry benchmarks and real-world customer workloads, which shows that Wred results in a 3x median speedup in tuning efficiency over an industrial-strength state-of-the-art index advisor, with only a 3.7% median loss in improvement---where improvement is the total workload cost as estimated by the query optimizer---and results in up to 24.7x speedup with 1.8% improvement loss. Furthermore, combining Wred and Isum (a state-of-the-art workload compression technique for index tuning) results in higher speedups than either of the two techniques alone, with 10.5x median speedup and 5% median improvement loss.

References

[1]
Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 930--932.
[2]
Sanjay Agrawal, Surajit Chaudhuri, and Vivek R Narasayya. 2000. Automated selection of materialized views and indexes in SQL databases. In VLDB, Vol. 2000. 496--505.
[3]
Matteo Brucato, Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. 2024. Wred: Workload Reduction for Scalable Index Tuning (extended version). https://matteo-brucato.github.io/files/wred_extended.
[4]
Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In SIGMOD. 227--238.
[5]
Nicolas Bruno and Surajit Chaudhuri. 2006. To tune or not to tune? A Lightweight Physical Design Alerter. In Proceedings of the 32nd international conference on Very large data bases. Citeseer, 499--510.
[6]
Diego Calvanese, Giuseppe De Giacomo, Maurizio Lenzerini, and Moshe Y Vardi. 2000. What is query rewriting?. In Cooperative Information Agents IV-The Future of Information Agents in Cyberspace: 4th International Workshop, CIA 2000, Boston, MA, USA, July 7--9, 2000. Proceedings 4. Springer, 51--59.
[7]
Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek Narasayya. 2002. Compressing SQL workloads. In Proceedings of the 2002 ACM SIGMOD international conference on Management of data. 488--499.
[8]
Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek R. Narasayya. 2002. Compressing SQL workloads. In SIGMOD. 488--499.
[9]
Surajit Chaudhuri and Vivek Narasayya. 2020. Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server. (June 2020). https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/
[10]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB 1997. 146--155. http://www.vldb.org/conf/1997/P146.PDF
[11]
Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378.
[12]
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey Naughton, and Stratis Viglas. 2020. Comprehensive and efficient workload compression. arXiv preprint arXiv:2011.05549 (2020).
[13]
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey F. Naughton, and Stratis Viglas. 2020. Comprehensive and Efficient Workload Compression. Proc. VLDB Endow. 14, 3 (2020), 418--430.
[14]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek R. Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In SIGMOD. 1241--1258.
[15]
Anna Fariha, Sheikh Muhammad Sarwar, and Alexandra Meliou. 2018. SQuID: Semantic similarity-aware query intent discovery. In Proceedings of the 2018 International Conference on Management of Data. 1745--1748.
[16]
Yujian Gan, Xinyun Chen, Jinxia Xie, Matthew Purver, John R Woodward, John Drake, and Qiaofu Zhang. 2021. Natural SQL: Making SQL easier to infer from natural language specifications. arXiv preprint arXiv:2109.05153 (2021).
[17]
Antara Ghosh, Jignashu Parikh, Vibhuti S Sengar, and Jayant R Haritsa. 2002. Plan selection based on query clustering. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 179--190.
[18]
Yuxing Han, Ziniu Wu, Peizhi Wu, Rong Zhu, Jingyi Yang, Liang Wei Tan, Kai Zeng, Gao Cong, Yanzhao Qin, Andreas Pfadler, Zhengping Qian, Jingren Zhou, Jiangneng Li, and Bin Cui. 2021. Cardinality Estimation in DBMS: A Comprehensive Benchmark Evaluation. Proc. VLDB Endow. 15, 4 (2021), 752--765. https://doi.org/10.14778/3503585.3503586
[19]
Shrainik Jain, Bill Howe, Jiaqi Yan, and Thierry Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).
[20]
Andrew Kane. 2017. The automatic indexer for Postgres. https://github.com/ankane/dexter.
[21]
Andrew Kane. 2017. Introducing Dexter, the Automatic Indexer for Postgres. (June 2017). https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27 last accessed November 2022.
[22]
Andrew Kane. 2017. Introducing Dexter, the Automatic Indexer for Postgres. https://medium.com/@ankane/introducing-dexter-the-automatic-indexer-for-postgres-5f8fa8b28f27.
[23]
Richard M Karp. 1972. Reducibility among combinatorial problems, Complexity of computer computations (RE Miller and JW Thatcher, editors).
[24]
Richard M Karp. 2010. Reducibility among combinatorial problems. Springer.
[25]
Hyeonji Kim, Byeong-Hoon So, Wook-Shin Han, and Hongrae Lee. 2020. Natural language to SQL: Where are we today? Proceedings of the VLDB Endowment 13, 10 (2020), 1737--1750.
[26]
Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic Mirror in My Hand, Which is The Best in The Land? An Experimental Evaluation of Index Selection Algorithms. VLDB 13, 12 (2020), 2382--2395. http://www.vldb.org/pvldb/vol13/p2382-kossmann.pdf
[27]
Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola, Oliver Kennedy, and Shambhu Upadhyaya. 2016. Ettu: Analyzing query intents in corporate databases. In Proceedings of the 25th international conference companion on world wide web. 463--466.
[28]
Hao Li, Chee-Yong Chan, and David Maier. 2015. Query from examples: An iterative, data-driven approach to query construction. Proceedings of the VLDB Endowment 8, 13 (2015), 2158--2169.
[29]
Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries Using Statistical Techniques. Proc. VLDB Endow. 5, 11 (jul 2012), 1555--1566. https://doi.org/10.14778/2350229.2350269
[30]
Stratos Papadomanolakis, Debabrata Dash, and Anastasia Ailamaki. 2007. Efficient use of the query optimizer for automated physical design. In Proceedings of the 33rd international conference on Very large data bases. 1093--1104.
[31]
Yannis Papakonstantinou and Vasilis Vassalos. 1999. Query rewriting for semistructured data. ACM SIGMOD Record 28, 2 (1999), 455--466.
[32]
Debjyoti Paul, Jie Cao, Feifei Li, and Vivek Srikumar. 2021. Database Workload Characterization with Query Plan Encoders. Proc. VLDB Endow. 15, 4 (dec 2021), 923--935. https://doi.org/10.14778/3503585.3503600
[33]
Ville Satopaa, Jeannie Albrecht, David Irwin, and Barath Raghavan. 2011. Finding a "Kneedle" in a Haystack: Detecting Knee Points in System Behavior. In 2011 31st International Conference on Distributed Computing Systems Workshops. 166--171. https://doi.org/10.1109/ICDCSW.2011.20
[34]
Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2022. ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning. In SIGMOD 2022. 660--673. https://doi.org/10.1145/3514221.3526152
[35]
Tarique Siddiqui, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. 2022. DISTILL: low-overhead data-driven techniques for filtering and costing indexes for scalable index tuning. Proceedings of the VLDB Endowment 15, 10 (2022), 2019--2031.
[36]
The TPC Benchmark?DS 2023. http://www.tpc.org/tpcds/.
[37]
The TPC Benchmark?H 2023. http://www.tpc.org/tpch/.
[38]
G. Valentin, M. Zuliani, D.C. Zilio, G. Lohman, and A. Skelley. 2000. DB2 advisor: an optimizer smart enough to recommend its own indexes. In Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073). 101--110. https://doi.org/10.1109/ICDE.2000.839397
[39]
Chengcheng Wan, Yiwen Zhu, Joyce Cahoon, Wenjing Wang, Katherine Lin, Sean Liu, Raymond Truong, Neetu Singh, Alexandra Ciortea, Konstantinos Karanasos, et al. 2023. Stitcher: Learned Workload Synthesis from Historical Performance Footprints. (2023).
[40]
Chenglong Wang, Alvin Cheung, and Rastislav Bodik. 2017. Interactive query synthesis from input-output examples. In Proceedings of the 2017 ACM International Conference on Management of Data. 1631--1634.
[41]
Wentao Wu, Yun Chi, Hakan Hacígümüs, and Jeffrey F. Naughton. 2013. Towards Predicting Query Execution Time for Concurrent and Dynamic Database Workloads. Proc. VLDB Endow. 6, 10 (aug 2013), 925--936. https://doi.org/10.14778/2536206.2536219
[42]
Wentao Wu, Yun Chi, Shenghuo Zhu, Junichi Tatemura, Hakan Hacigümüs, and Jeffrey F. Naughton. 2013. Predicting query execution time: Are optimizer cost models really unusable?. In 2013 IEEE 29th International Conference on Data Engineering (ICDE). 1081--1092. https://doi.org/10.1109/ICDE.2013.6544899
[43]
Wentao Wu, Chi Wang, Tarique Siddiqui, Junxiong Wang, Vivek Narasayya, Surajit Chaudhuri, and Philip A. Bernstein. 2022. Budget-Aware Index Tuning with Reinforcement Learning. In Proceedings of the 2022 International Conference on Management of Data (Philadelphia, PA, USA) (SIGMOD '22). Association for Computing Machinery, New York, NY, USA, 1528--1541. https://doi.org/10.1145/3514221.3526128
[44]
Navid Yaghmazadeh, Yuepeng Wang, Isil Dillig, and Thomas Dillig. 2017. SQLizer: query synthesis from natural language. Proc. ACM Program. Lang. 1, OOPSLA (2017), 63:1--63:26. https://doi.org/10.1145/3133887
[45]
Xuanhe Zhou, Guoliang Li, Chengliang Chai, and Jianhua Feng. 2021. A Learned Query Rewrite System Using Monte Carlo Tree Search. Proc. VLDB Endow. 15, 1 (sep 2021), 46--58. https://doi.org/10.14778/3485450.3485456

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 1
SIGMOD
February 2024
1874 pages
EISSN:2836-6573
DOI:10.1145/3654807
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 March 2024
Published in PACMMOD Volume 2, Issue 1

Permissions

Request permissions for this article.

Author Tags

  1. index tuning
  2. query rewriting.
  3. workload compression
  4. workload reduction

Qualifiers

  • Research-article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)239
  • Downloads (Last 6 weeks)53
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Wii: Dynamic Budget Reallocation In Index TuningProceedings of the ACM on Management of Data10.1145/36549852:3(1-26)Online publication date: 30-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Get Access

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media