Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3526152acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning

Published: 11 June 2022 Publication History

Abstract

Today's database systems include index advisors that recommend an appropriate set of indexes for an input workload. Since index tuning on large and complex workloads can be resource-intensive and time-consuming, workload compression techniques have been proposed to improve the scalability of index tuning. Workload compression techniques aim to efficiently identify a small subset of queries in the workload to tune such that the indexes recommended when tuning the compressed workload give similar performance improvements as when tuning the input workload. In this paper, we propose ISUM, a new workload compression algorithm that is based on two key ideas: a low-overhead technique for estimating the improvement in performance of the input workload when a subset of queries is selected for index tuning, and a novel method for concisely representing information across queries in the workload that improves scalability by avoiding pairwise comparisons between queries when choosing the set of queries to tune. Our evaluation over industry benchmarks and real-world customer workloads shows that ISUM results in a 1.4x of median and 2x of maximum performance improvements for the input workload when compared to prior techniques over similar compressed workload sizes.

Supplementary Material

MP4 File (ISUM-SIGMOD22.mp4)
Presentation video.

References

[1]
Apr 01, 2022. Azure SQL Database. https://azure.microsoft.com/en-us/products/azure-sql/database/.
[2]
Apr 01, 2022. DEXTER Index Tuning Tool on PostgresSQL. https://github.com/ankane/dexter.
[3]
Apr 01, 2022. DTA utility. https://docs.microsoft.com/en-us/sql/tools/dta/dta-utility?view=sql-server-ver15.
[4]
Apr 01, 2022. Google Cloud SQL. https://cloud.google.com/sql.
[5]
Apr 01, 2022. Query Store. https://docs.microsoft.com/en-us/sql/relational-databases/performance/monitoring-performance-by-using-the-query-store?view=sql-server-ver15.
[6]
Apr 01, 2022. Statistics. https://docs.microsoft.com/en-us/sql/relational-databases/statistics/statistics?view=sql-server-ver15.
[7]
Sanjay Agrawal, Surajit Chaudhuri, Lubor Kollar, Arun Marathe, Vivek Narasayya, and Manoj Syamala. 2005. Database tuning advisor for microsoft sql server 2005. In Proceedings of the 2005 ACM SIGMOD international conference on Management of data. 930--932.
[8]
Nicolas Bruno and Surajit Chaudhuri. 2005. Automatic Physical Database Tuning: A Relaxation-based Approach. In SIGMOD. 227--238.
[9]
Nicolas Bruno and Surajit Chaudhuri. 2006. To tune or not to tune? A Lightweight Physical Design Alerter. In Proceedings of the 32nd international conference on Very large data bases. Citeseer, 499--510.
[10]
Surajit Chaudhuri, Mayur Datar, and Vivek R. Narasayya. 2004. Index Selection for Databases: A Hardness Study and a Principled Heuristic Solution. IEEE Trans. Knowl. Data Eng. 16, 11 (2004), 1313--1323.
[11]
Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek R. Narasayya. 2002. Com- pressing SQL workloads. In SIGMOD. 488--499.
[12]
Surajit Chaudhuri and Vivek Narasayya. 2020. Anytime Algorithm of Database Tuning Advisor for Microsoft SQL Server. https://www.microsoft.com/en-us/research/publication/anytime-algorithm-of-database-tuning-advisor-for-microsoft-sql-server/.
[13]
Surajit Chaudhuri, Vivek Narasayya, and Prasanna Ganesan. 2003. Primitives for workload summarization and implications for SQL. In Proceedings 2003 VLDB Conference. Elsevier, 730--741.
[14]
Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In VLDB. 146--155.
[15]
Surajit Chaudhuri and Vivek R. Narasayya. 1998. AutoAdmin 'What-if' Index Analysis Utility. In SIGMOD. 367--378.
[16]
Surajit Chaudhuri and Vivek R. Narasayya. 1999. Index Merging. In ICDE.
[17]
Douglas Comer. 1978. The Difficulty of Optimum Index Selection. ACM Trans. Database Syst. 3, 4 (1978), 440--445.
[18]
Sudipto Das, Miroslav Grbic, Igor Ilic, Isidora Jovandic, Andrija Jovanovic, Vivek R. Narasayya, Miodrag Radulovic, Maja Stikic, Gaoxiang Xu, and Surajit Chaudhuri. 2019. Automatically Indexing Millions of Databases in Microsoft Azure SQL Database. In SIGMOD. 666--679.
[19]
Anirban Dasgupta, Ravi Kumar, and Sujith Ravi. 2013. Summarization through submodularity and dispersion. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1014--1022.
[20]
Shaleen Deep, Anja Gruenheid, Paraschos Koutris, Jeffrey F. Naughton, and Stratis Viglas. 2020. Comprehensive and Efficient Workload Compression. Proc. VLDB Endow. 14, 3 (2020), 418--430.
[21]
Bailu Ding, Surajit Chaudhuri, Johannes Gehrke, and Vivek R. Narasayya. 2021. DSB: A Decision Support Benchmark for Workload-Driven and Traditional Database Systems. Proc. VLDB Endow. 14, 13 (2021), 3376--3388.
[22]
S. J. Finkelstein, M. Schkolnick, and P. Tiberio. 1988. Physical Database Design for Relational Databases. ACM Trans. Database Syst. 13, 1 (1988).
[23]
Antara Ghosh, Jignashu Parikh, Vibhuti S Sengar, and Jayant R Haritsa. 2002. Plan selection based on query clustering. In VLDB'02: Proceedings of the 28th International Conference on Very Large Databases. Elsevier, 179--190.
[24]
Teofilo F Gonzalez. 1985. Clustering to minimize the maximum intercluster distance. Theoretical computer science 38 (1985), 293--306.
[25]
Shrainik Jain, Bill Howe, Jiaqi Yan, and Thierry Cruanes. 2018. Query2Vec: An Evaluation of NLP Techniques for Generalized Workload Analytics. arXiv preprint arXiv:1801.05613 (2018).
[26]
Jan Kossmann, Stefan Halfpap, Marcel Jankrift, and Rainer Schlosser. 2020. Magic mirror in my hand, which is the best in the land? An Experimental Evaluation of Index Selection Algorithms. Proc. VLDB Endow. 13, 11 (2020), 2382--2395.
[27]
Andreas Krause and Daniel Golovin. 2014. Submodular Function Maximization. In Tractability: Practical Approaches to Hard Problems. Cambridge University Press, 71--104.
[28]
Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola, Oliver Kennedy, and Shambhu Upadhyaya. 2016. Ettu: Analyzing query intents in corporate databases. In Proceedings of the 25th international conference companion on world wide web. 463--466.
[29]
Rishabh Mehrotra and Emine Yilmaz. 2015. Representative & informative query selection for learning to rank using submodular functions. In Proceedings of the 38th international ACM sigir conference on research and development in information retrieval. 545--554.
[30]
George L. Nemhauser, Laurence A. Wolsey, and Marshall L. Fisher. 1978. An analysis of approximations for maximizing submodular set functions - I. Math. Program. 14, 1 (1978), 265--294.
[31]
Suphakit Niwattanakul, Jatsada Singthongchai, Ekkachai Naenudorn, and Supachanun Wanapu. 2013. Using of Jaccard coefficient for keywords similarity. In Proceedings of the international multiconference of engineers and computer scientists, Vol. 1. 380--384.
[32]
Stratos Papadomanolakis, Debabrata Dash, and Anastassia Ailamaki. 2007. Efficient Use of the Query Optimizer for Automated Database Design. ACM.
[33]
Sayan Ranu, Minh Hoang, and Ambuj Singh. 2014. Answering top-k representative queries on graph databases. In Proceedings of the 2014 ACM SIGMOD international conference on Management of data. 1163--1174.
[34]
Tarique Siddiqui, Saehan Jo, Wentao Wu, Chi Wang, Vivek Narasayya, and Surajit Chaudhuri. 2022. ISUM: Efficiently Compressing Large and Complex Workloads for Scalable Index Tuning (MSR Technical Report). https://www.microsoft.com/en-us/research/publication/isum/.
[35]
Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohman, and Alan Skelley. 2000. DB2 Advisor: An Optimizer Smart Enough to Recommend Its Own Indexes. In ICDE. 101--110.

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language ModelsProceedings of the VLDB Endowment10.14778/3681954.368201717:11(3511-3523)Online publication date: 30-Aug-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
June 2022
2597 pages
ISBN:9781450392495
DOI:10.1145/3514221
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. index tuning
  2. workload compression

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '22
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)84
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Hit the Gym: Accelerating Query Execution to Efficiently Bootstrap Behavior Models for Self-Driving Database Management SystemsProceedings of the VLDB Endowment10.14778/3681954.368203017:11(3680-3693)Online publication date: 1-Jul-2024
  • (2024)Generating Succinct Descriptions of Database Schemata for Cost-Efficient Prompting of Large Language ModelsProceedings of the VLDB Endowment10.14778/3681954.368201717:11(3511-3523)Online publication date: 30-Aug-2024
  • (2024)The Holon Approach for Simultaneously Tuning Multiple Components in a Self-Driving Database Management System with Machine Learning via Synthesized Proto-ActionsProceedings of the VLDB Endowment10.14778/3681954.368200717:11(3373-3387)Online publication date: 30-Aug-2024
  • (2024)Refactoring Index Tuning Process with Benefit EstimationProceedings of the VLDB Endowment10.14778/3654621.365462217:7(1528-1541)Online publication date: 30-May-2024
  • (2024)ML-Powered Index Tuning: An Overview of Recent Progress and Open ChallengesACM SIGMOD Record10.1145/3641832.364183652:4(19-30)Online publication date: 19-Jan-2024
  • (2024)Wred: Workload Reduction for Scalable Index TuningProceedings of the ACM on Management of Data10.1145/36393052:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Robustness of Updatable Learning-based Index Advisors against Poisoning AttackProceedings of the ACM on Management of Data10.1145/36392652:1(1-26)Online publication date: 26-Mar-2024
  • (2024)Compressed Data Direct Computing for DatabasesIEEE Transactions on Knowledge and Data Engineering10.1109/TKDE.2023.331627436:5(1902-1918)Online publication date: May-2024
  • (2024)MFIX: An Efficient and Reliable Index Advisor via Multi-Fidelity Bayesian Optimization2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00331(4343-4356)Online publication date: 13-May-2024
  • (2024)F-TADOC: FPGA-Based Text Analytics Directly on Compression with HLS2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00287(3739-3752)Online publication date: 13-May-2024
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media