research-article

The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"

Author:

Immanuel TrummerAuthors Info & Claims

Proceedings of the VLDB Endowment, Volume 14, Issue 7

Pages 1159 - 1165

https://doi.org/10.14778/3450980.3450984

Published: 01 March 2021 Publication History

Abstract

A large body of knowledge on database tuning is available in the form of natural language text. We propose to leverage natural language processing (NLP) to make that knowledge accessible to automated tuning tools. We describe multiple avenues to exploit NLP for database tuning, and outline associated challenges and opportunities. As a proof of concept, we describe a simple prototype system that exploits recent NLP advances to mine tuning hints from Web documents. We show that mined tuning hints improve performance of MySQL and Postgres on TPC-H, compared to the default configuration.

References

[1]

Dana Van Aken, Andrew Pavlo, and Geoffrey J Gordon. 2017. Automatic database management system tuning through large-scale machine learning. In SIGMOD. 1009--1024.

Digital Library

[2]

CB Browne and Edward Powley. 2012. A survey of monte carlo tree search methods. Trans. on Computational Intelligence and AI in Games 4, 1 (2012), 1--49. http://ieeexplore.ieee.org/xpls/abs{_}all.jsp?arnumber=6145622

[3]

Alberto Caprara, Matteo Fischetti, and Dario Maio. 1995. Exact and approximate algorithms for the index selection problem in physical database design. KDE 7, 6 (1995), 955--967.

Digital Library

[4]

S. Chatterji and SSK Evani. 2002. On the complexity of approximate query optimization. In PODS. 282--292.

Digital Library

[5]

Surajit Chaudhuri. 2004. Index selection for databases: A hardness study and a principled heuristic solution. KDE 16, 11 (2004), 1313--1323. http://ieeexplore.ieee.org/xpls/abs{_}all.jsp?arnumber=1339260

Digital Library

[6]

Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, Vol. 1. 4171--4186. arXiv:1810.04805

[7]

J. Gorodkin. 2004. Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry 28, 5-6 (2004), 367--374.

Digital Library

[8]

Andrey Gubichev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? PVLDB 9, 3 (2015), 204--215.

Digital Library

[9]

Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jeffrey D Ullman. 1997. Index selection for OLAP. In ICDE. 208--219.

Digital Library

[10]

Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL. 328--339.

[11]

I F Ilyas, V Markl, P Haas, P Brown, and Ashraf Aboulnaga. 2004. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD. 647--658. arXiv:ISBN 0-89791-128-8

Digital Library

[12]

Howard Karloff and Milena Mihail. 2013. On the complexity of the cinepanettone. In PODS. 200--213.

[13]

Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: estimating correlated joins with deep learning. In CIDR. arXiv:1809.00677 http://arxiv.org/abs/1809.00677

[14]

H Kllapi, E Sitaridi, M M Tsangaris, and Y E Ioannidis. 2011. Schedule Optimization for Data Processing Flows on the Cloud. In SIGMOD.

Digital Library

[15]

Fei Li and HV Jagadish. 2014. NaLIR: an interactive natural language interface for querying relational databases. SIGMOD (2014), 709--712.

Digital Library

[16]

Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2018. QTune: A QueryAware database tuning system with deep reinforcement learning. PVLDB 12, 12 (2018), 2118--2130.

Digital Library

[17]

Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. (2020), 4870--4888. arXiv:2012.12627

[18]

Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 1 (2019). arXiv:1907.11692

[19]

Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active Learning for ML Enhanced Database Systems. In SIGMOD. 175--191.

Digital Library

[20]

Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2018. Neo: A Learned query optimizer. PVLDB 12, 11 (2018), 1705--1718. arXiv:1904.03711

Digital Library

[21]

Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2018. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In DEEM. arXiv:1803.08604 http://arxiv.org/abs/1803.08604

Digital Library

[22]

Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2020. QuickSel: Quick Selectivity Learning with Mixture Models. In SIGMOD. 1017--1033. arXiv:1812.10568

Digital Library

[23]

Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-driving database management systems. In CIDR.

[24]

Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, and Thomas Wolf. 2019. Transfer Learning in Natural Language Processing. In ACL: Tutorials. 15--18.

[25]

Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R Mittal, and Fatma Ozcan. 2016. ATHENA: An ontology-driven system for natural language querying over relational data stores. VLDB 9, 12 (2016), 1209--1220.

Digital Library

[26]

PG G Selinger, MM M Astrahan, D D Chamberlin, R A Lorie, and T G Price. 1979. Access path selection in a relational database management system. In SIGMOD. 23--34. http://dl.acm.org/citation.cfm?id=582095.582099

Digital Library

[27]

Amirsina Torfi, Rouzbeh A. Shirvani, Yaser Keneshloo, Nader Tavaf, and Edward A. Fox. 2020. Natural language processing advancements by deep learning: A survey. arXiv (2020), 1--21. arXiv:2003.01200

[28]

TPC. 2013. TPC-H Benchmark. http://www.tpc.org/tpch/

[29]

Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB: regret-bounded query evaluation via reinforcement learning. In SIGMOD. 1039--1050.

Digital Library

[30]

Jian Yang, Kamalakar Karlapalem, and Qing Li. 1997. Algorithms for materialized view design in data warehousing environment. In VLDB. 136--145. http://www.vldb.org/conf/1997/P136.PDF

Digital Library

[31]

Haitao Yuan, Guoliang Li, Ling Feng, Ji Sun, and Yue Han. 2020. Automatic view generation with deep learning and reinforcement learning. In ICDE, Vol. 2020-April. 1501--1512.

[32]

Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, and Geoffrey J Gordon. 1910. A demonstration of the OtterTune automatic database management system tuning service. VLDB 11, 12 (1910), 1910--1913.

Digital Library

[33]

Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In SIGMOD. 415--432.

Digital Library

Cited By

Rao KCoviello GBenedetti PGiuseppe De Vita CMellone GChakradhar SLofstead JDayal J(2024)ECO-LLM: LLM-based Edge Cloud OptimizationProceedings of the 2024 Workshop on AI For Systems10.1145/3660605.3660941(7-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3660605.3660941
Lao JWang YLi YWang JZhang YCheng ZChen WZhou YTang MWang JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning SystemCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654739(504-507)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654739
Fernandez RElmore AFranklin MKrishnan STan C(2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611527
Show More Cited By

Index Terms

The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"
1. Computing methodologies
  1. Artificial intelligence
    1. Natural language processing

Index terms have been assigned to the content through auto-classification.

Recommendations

Oracle High-Performance SQL Tuning
Automatic Database Knob Tuning: A Survey
Knob tuning plays an important role in database optimization, which tunes knob settings to optimize the database performance or improve resource utilization. However, there are several common challenges in knob tuning. First, databases have hundreds of ...
Incorporation of entries for phrasal verbs with case frames into a lexical database

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the VLDB Endowment

Proceedings of the VLDB Endowment Volume 14, Issue 7

March 2021

130 pages

ISSN:2150-8097

Editors:
Xin Luna Dong
Amazon
,
Felix Naumann
HPI, University of Potsdam

Issue’s Table of Contents

Publisher

VLDB Endowment

Publication History

Published: 01 March 2021

Published in PVLDB Volume 14, Issue 7

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

5
Total Citations
View Citations
222
Total Downloads

Downloads (Last 12 months)36
Downloads (Last 6 weeks)4

Reflects downloads up to 03 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Rao KCoviello GBenedetti PGiuseppe De Vita CMellone GChakradhar SLofstead JDayal J(2024)ECO-LLM: LLM-based Edge Cloud OptimizationProceedings of the 2024 Workshop on AI For Systems10.1145/3660605.3660941(7-12)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3660605.3660941
Lao JWang YLi YWang JZhang YCheng ZChen WZhou YTang MWang JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning SystemCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654739(504-507)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654739
Fernandez RElmore AFranklin MKrishnan STan C(2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
https://dl.acm.org/doi/10.14778/3611479.3611527
Somashekar GKumar RVieira MCardellini VDi Marco ATuma P(2023)Enhancing the Configuration Tuning Pipeline of Large-Scale Distributed Applications Using Large Language Models (Idea Paper)Companion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3585032(39-44)Online publication date: 15-Apr-2023
https://dl.acm.org/doi/10.1145/3578245.3585032
Choi DYoon HLee HChung Y(2022)WaffleProceedings of the VLDB Endowment10.14778/3551793.355180015:11(2375-2388)Online publication date: 29-Sep-2022
https://dl.acm.org/doi/10.14778/3551793.3551800
Trummer I(2021)Database Tuning using Natural Language ProcessingACM SIGMOD Record10.1145/3503780.350378850:3(27-28)Online publication date: 2-Dec-2021
https://dl.acm.org/doi/10.1145/3503780.3503788

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents