Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"

Published: 01 March 2021 Publication History

Abstract

A large body of knowledge on database tuning is available in the form of natural language text. We propose to leverage natural language processing (NLP) to make that knowledge accessible to automated tuning tools. We describe multiple avenues to exploit NLP for database tuning, and outline associated challenges and opportunities. As a proof of concept, we describe a simple prototype system that exploits recent NLP advances to mine tuning hints from Web documents. We show that mined tuning hints improve performance of MySQL and Postgres on TPC-H, compared to the default configuration.

References

[1]
Dana Van Aken, Andrew Pavlo, and Geoffrey J Gordon. 2017. Automatic database management system tuning through large-scale machine learning. In SIGMOD. 1009--1024.
[2]
CB Browne and Edward Powley. 2012. A survey of monte carlo tree search methods. Trans. on Computational Intelligence and AI in Games 4, 1 (2012), 1--49. http://ieeexplore.ieee.org/xpls/abs{_}all.jsp?arnumber=6145622
[3]
Alberto Caprara, Matteo Fischetti, and Dario Maio. 1995. Exact and approximate algorithms for the index selection problem in physical database design. KDE 7, 6 (1995), 955--967.
[4]
S. Chatterji and SSK Evani. 2002. On the complexity of approximate query optimization. In PODS. 282--292.
[5]
Surajit Chaudhuri. 2004. Index selection for databases: A hardness study and a principled heuristic solution. KDE 16, 11 (2004), 1313--1323. http://ieeexplore.ieee.org/xpls/abs{_}all.jsp?arnumber=1339260
[6]
Jacob Devlin, Ming Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of deep bidirectional transformers for language understanding. In NAACL, Vol. 1. 4171--4186. arXiv:1810.04805
[7]
J. Gorodkin. 2004. Comparing two K-category assignments by a K-category correlation coefficient. Computational Biology and Chemistry 28, 5-6 (2004), 367--374.
[8]
Andrey Gubichev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? PVLDB 9, 3 (2015), 204--215.
[9]
Himanshu Gupta, Venky Harinarayan, Anand Rajaraman, and Jeffrey D Ullman. 1997. Index selection for OLAP. In ICDE. 208--219.
[10]
Jeremy Howard and Sebastian Ruder. 2018. Universal Language Model Fine-tuning for Text Classification. In ACL. 328--339.
[11]
I F Ilyas, V Markl, P Haas, P Brown, and Ashraf Aboulnaga. 2004. CORDS: Automatic discovery of correlations and soft functional dependencies. In SIGMOD. 647--658. arXiv:ISBN 0-89791-128-8
[12]
Howard Karloff and Milena Mihail. 2013. On the complexity of the cinepanettone. In PODS. 200--213.
[13]
Andreas Kipf, Thomas Kipf, Bernhard Radke, Viktor Leis, Peter Boncz, and Alfons Kemper. 2018. Learned cardinalities: estimating correlated joins with deep learning. In CIDR. arXiv:1809.00677 http://arxiv.org/abs/1809.00677
[14]
H Kllapi, E Sitaridi, M M Tsangaris, and Y E Ioannidis. 2011. Schedule Optimization for Data Processing Flows on the Cloud. In SIGMOD.
[15]
Fei Li and HV Jagadish. 2014. NaLIR: an interactive natural language interface for querying relational databases. SIGMOD (2014), 709--712.
[16]
Guoliang Li, Xuanhe Zhou, Shifu Li, and Bo Gao. 2018. QTune: A QueryAware database tuning system with deep reinforcement learning. PVLDB 12, 12 (2018), 2118--2130.
[17]
Xi Victoria Lin, Richard Socher, and Caiming Xiong. 2020. Bridging Textual and Tabular Data for Cross-Domain Text-to-SQL Semantic Parsing. (2020), 4870--4888. arXiv:2012.12627
[18]
Yinhan Liu, Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen, Omer Levy, Mike Lewis, Luke Zettlemoyer, and Veselin Stoyanov. 2019. RoBERTa: A robustly optimized BERT pretraining approach. arXiv 1 (2019). arXiv:1907.11692
[19]
Lin Ma, Bailu Ding, Sudipto Das, and Adith Swaminathan. 2020. Active Learning for ML Enhanced Database Systems. In SIGMOD. 175--191.
[20]
Ryan Marcus, Parimarjan Negi, Hongzi Mao, Chi Zhang, Mohammad Alizadeh, Tim Kraska, Olga Papaemmanouil, and Nesime Tatbul. 2018. Neo: A Learned query optimizer. PVLDB 12, 11 (2018), 1705--1718. arXiv:1904.03711
[21]
Jennifer Ortiz, Magdalena Balazinska, Johannes Gehrke, and S. Sathiya Keerthi. 2018. Learning State Representations for Query Optimization with Deep Reinforcement Learning. In DEEM. arXiv:1803.08604 http://arxiv.org/abs/1803.08604
[22]
Yongjoo Park, Shucheng Zhong, and Barzan Mozafari. 2020. QuickSel: Quick Selectivity Learning with Mixture Models. In SIGMOD. 1017--1033. arXiv:1812.10568
[23]
Andrew Pavlo, Gustavo Angulo, Joy Arulraj, Haibin Lin, Jiexi Lin, Lin Ma, Prashanth Menon, Todd C Mowry, Matthew Perron, Ian Quah, Siddharth Santurkar, Anthony Tomasic, Skye Toor, Dana Van Aken, Ziqi Wang, Yingjun Wu, Ran Xian, and Tieying Zhang. 2017. Self-driving database management systems. In CIDR.
[24]
Sebastian Ruder, Matthew E Peters, Swabha Swayamdipta, and Thomas Wolf. 2019. Transfer Learning in Natural Language Processing. In ACL: Tutorials. 15--18.
[25]
Diptikalyan Saha, Avrilia Floratou, Karthik Sankaranarayanan, Umar Farooq Minhas, Ashish R Mittal, and Fatma Ozcan. 2016. ATHENA: An ontology-driven system for natural language querying over relational data stores. VLDB 9, 12 (2016), 1209--1220.
[26]
PG G Selinger, MM M Astrahan, D D Chamberlin, R A Lorie, and T G Price. 1979. Access path selection in a relational database management system. In SIGMOD. 23--34. http://dl.acm.org/citation.cfm?id=582095.582099
[27]
Amirsina Torfi, Rouzbeh A. Shirvani, Yaser Keneshloo, Nader Tavaf, and Edward A. Fox. 2020. Natural language processing advancements by deep learning: A survey. arXiv (2020), 1--21. arXiv:2003.01200
[28]
TPC. 2013. TPC-H Benchmark. http://www.tpc.org/tpch/
[29]
Immanuel Trummer, Junxiong Wang, Deepak Maram, Samuel Moseley, Saehan Jo, and Joseph Antonakakis. 2019. SkinnerDB: regret-bounded query evaluation via reinforcement learning. In SIGMOD. 1039--1050.
[30]
Jian Yang, Kamalakar Karlapalem, and Qing Li. 1997. Algorithms for materialized view design in data warehousing environment. In VLDB. 136--145. http://www.vldb.org/conf/1997/P136.PDF
[31]
Haitao Yuan, Guoliang Li, Ling Feng, Ji Sun, and Yue Han. 2020. Automatic view generation with deep learning and reinforcement learning. In ICDE, Vol. 2020-April. 1501--1512.
[32]
Bohan Zhang, Dana Van Aken, Justin Wang, Tao Dai, Shuli Jiang, Jacky Lao, Siyuan Sheng, Andrew Pavlo, and Geoffrey J Gordon. 1910. A demonstration of the OtterTune automatic database management system tuning service. VLDB 11, 12 (1910), 1910--1913.
[33]
Ji Zhang, Yu Liu, Ke Zhou, Guoliang Li, Zhili Xiao, Bin Cheng, Jiashu Xing, Yangtao Wang, Tianheng Cheng, Li Liu, Minwei Ran, and Zekang Li. 2019. An end-to-end automatic cloud database tuning system using deep reinforcement learning. In SIGMOD. 415--432.

Cited By

View all
  • (2024)ECO-LLM: LLM-based Edge Cloud OptimizationProceedings of the 2024 Workshop on AI For Systems10.1145/3660605.3660941(7-12)Online publication date: 3-Jun-2024
  • (2024)A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning SystemCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654739(504-507)Online publication date: 9-Jun-2024
  • (2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
  • Show More Cited By

Index Terms

  1. The case for NLP-enhanced database tuning: towards tuning tools that "read the manual"
    Index terms have been assigned to the content through auto-classification.

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the VLDB Endowment
    Proceedings of the VLDB Endowment  Volume 14, Issue 7
    March 2021
    130 pages
    ISSN:2150-8097
    Issue’s Table of Contents

    Publisher

    VLDB Endowment

    Publication History

    Published: 01 March 2021
    Published in PVLDB Volume 14, Issue 7

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)36
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 03 Oct 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)ECO-LLM: LLM-based Edge Cloud OptimizationProceedings of the 2024 Workshop on AI For Systems10.1145/3660605.3660941(7-12)Online publication date: 3-Jun-2024
    • (2024)A Demonstration of GPTuner: A GPT-Based Manual-Reading Database Tuning SystemCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654739(504-507)Online publication date: 9-Jun-2024
    • (2023)How Large Language Models Will Disrupt Data ManagementProceedings of the VLDB Endowment10.14778/3611479.361152716:11(3302-3309)Online publication date: 24-Aug-2023
    • (2023)Enhancing the Configuration Tuning Pipeline of Large-Scale Distributed Applications Using Large Language Models (Idea Paper)Companion of the 2023 ACM/SPEC International Conference on Performance Engineering10.1145/3578245.3585032(39-44)Online publication date: 15-Apr-2023
    • (2022)WaffleProceedings of the VLDB Endowment10.14778/3551793.355180015:11(2375-2388)Online publication date: 29-Sep-2022
    • (2021)Database Tuning using Natural Language ProcessingACM SIGMOD Record10.1145/3503780.350378850:3(27-28)Online publication date: 2-Dec-2021

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media