Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3318464.3380602acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Facilitating SQL Query Composition and Analysis

Published: 31 May 2020 Publication History

Abstract

Formulating efficient SQL queries requires several cycles of tuning and execution. We examine methods that can accelerate and improve this interaction by providing insights about SQL queries prior to execution. We achieve this by predicting properties such as the query answer size, its run-time, and error class. Unlike existing approaches, our approach does not rely on any statistics from the database instance or query execution plans. Our approach is based on using data-driven machine learning techniques that rely on large query workloads to model SQL queries and their properties. Empirical results show that the neural network models are more accurate in predicting several query properties.

Supplementary Material

MP4 File (3318464.3380602.mp4)
Presentation Video

References

[1]
Mert Akdere, Ugur Cetintemel, Matteo Riondato, Eli Upfal, and Stanley B Zdonik. 2012. Learning-Based Query Performance Modeling and Prediction. In ICDE. 390--401.
[2]
Miltiadis Allamanis, Earl T Barr, Premkumar Devanbu, and Charles Sutton. 2018. A Survey of Machine Learning for Big Code and Naturalness. Comput. Surveys, Vol. 51, 4 (2018), 81.
[3]
Shaojie Bai, J. Zico Kolter, and Vladlen Koltun. 2018. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. ArXiv, Vol. 1803.01271 (2018).
[4]
Yoshua Bengio. 2011. Deep Learning of Representations for Unsupervised and Transfer Learning. In UTLW. 17--37.
[5]
Sonia Bergamaschi, Elton Domnori, Francesco Guerra, Raquel Trillo Lado, and Yannis Velegrakis. 2011. Keyword Search over Relational Databases: A Metadata Approach. In SIGMOD. 565--576.
[6]
Gloria Chatzopoulou, Magdalini Eirinaki, and Neoklis Polyzotis. 2009. Query Recommendations for Interactive Database Exploration. In SSDBM. 3--18.
[7]
Surajit Chaudhuri, Ashish Kumar Gupta, and Vivek Narasayya. 2002. Compressing SQL Workloads. In SIGMOD. 488--499.
[8]
Alexis Conneau, Holger Schwenk, Lo"ic Barrault, and Yann Lecun. 2016. Very Deep Convolutional Networks for Text Classification. ArXiv, Vol. 1606.01781 (2016).
[9]
Kyriaki Dimitriadou, Olga Papaemmanouil, and Yanlei Diao. 2014. Explore-by-Example: An Automatic Query Steering Framework for Interactive Data Exploration. In SIGMOD. 517--528.
[10]
Bailu Ding, Sudipto Das, Ryan Marcus, Wentao Wu, Surajit Chaudhuri, and Vivek Narasayya. 2019. AI Meets AI: Leveraging Query Executions to Improve Index Recommendations. In SIGMOD. 1241--1258.
[11]
Magdalini Eirinaki, Suju Abraham, Neoklis Polyzotis, and Naushin Shaikh. 2014. QueRIE: Collaborative Database Exploration. TKDE, Vol. 26, 7 (2014), 1778--1790.
[12]
Ju Fan, Guoliang Li, and Lizhu Zhou. 2011. Interactive SQL Query Suggestion: Making Databases User-Friendly. In ICDE. 351--362.
[13]
Archana Ganapathi, Harumi Kuno, Umeshwar Dayal, Janet L Wiener, Armando Fox, Michael Jordan, and David Patterson. 2009. Predicting Multiple Metrics for Queries: Better Decisions Enabled by Machine Learning. In ICDE. 592--603.
[14]
Ian Goodfellow, Yoshua Bengio, and Aaron Courville. 2016. Deep Learning .MIT Press.
[15]
Google. 2010. Big Query. https://cloud.google.com/bigquery/.
[16]
Hakan Hacigümücs, Bala Iyer, Chen Li, and Sharad Mehrotra. 2002. Executing SQL over Encrypted Data in the Database-Service-Provider Model. In SIGMOD. 216--227.
[17]
Peter J Huber et al. 1964. Robust estimation of a location parameter. The Annals of Mathematical Statistics, Vol. 35, 1 (1964), 73--101.
[18]
Amazon Inc. 2012a. Amazon Redshift. https://aws.amazon.com/redshift/.
[19]
Snowflake Inc. 2012b. Snowflake. https://www.snowflake.com/.
[20]
HV Jagadish, Adriane Chapman, Aaron Elkiss, Magesh Jayapandian, Yunyao Li, Arnab Nandi, and Cong Yu. 2007. Making Database Systems Usable. In SIGMOD. 13--24.
[21]
Shrainik Jain and Bill Howe. 2018. Query2Vec: NLP Meets Databases for Generalized Workload Analytics. ArXiv, Vol. 1801.05613 (2018).
[22]
Shrainik Jain, Dominik Moritz, Daniel Halperin, Bill Howe, and Ed Lazowska. 2016. SQLShare: Results from a Multi-Year SQL-as-a-Service Experiment. In SIGMOD. 281--293.
[23]
Shrainik Jain, Jiaqi Yan, Thierry Cruane, and Bill Howe. 2018. Database-Agnostic Workload Management. ArXiv, Vol. 1808.08355 (2018).
[24]
Magesh Jayapandian and HV Jagadish. 2009. Automating the Design and Construction of Query Forms. TKDE, Vol. 21, 10 (2009), 1389--1402.
[25]
Rie Johnson and Tong Zhang. 2014. Effective Use of Word Order for Text Categorization with Convolutional Neural Networks. ArXiv, Vol. 1412.1058 (2014).
[26]
Rie Johnson and Tong Zhang. 2015. Semi-Supervised Convolutional Neural Networks for Text Categorization via Region Embedding. In Advances in Neural Information Processing Systems. 919--927.
[27]
Rie Johnson and Tong Zhang. 2016a. Convolutional neural networks for text categorization: Shallow Word-Level vs. Deep Character-Level. ArXiv, Vol. 1609.00718 (2016).
[28]
Rie Johnson and Tong Zhang. 2016b. Supervised and Semi-Supervised Text Categorization using LSTM for Region Embeddings. In ICML. 526--534.
[29]
Rie Johnson and Tong Zhang. 2017. Deep Pyramid Convolutional Neural Networks for Text Categorization. In ACL. 562--570.
[30]
Nodira Khoussainova, YongChul Kwon, Magdalena Balazinska, and Dan Suciu. 2010. SnipSuggest: Context-Aware Autocompletion for SQL. PVLDB, Vol. 4, 1 (2010), 22--33.
[31]
Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. ArXiv, Vol. 1408.5882 (2014).
[32]
Yoon Kim, Yacine Jernite, David Sontag, and Alexander M Rush. 2016. Character-Aware Neural Language Models. In AAAI. 2741--2749.
[33]
Diederik P Kingma and Jimmy Ba. 2014. Adam: A Method for Stochastic Optimization. ArXiv, Vol. 1412.6980 (2014).
[34]
Gokhan Kul, Duc Luong, Ting Xie, Patrick Coonan, Varun Chandola, Oliver Kennedy, and Shambhu Upadhyaya. 2016. Ettu: Analyzing Query Intents in Corporate Databases. In WWW. 463--466.
[35]
Yann LeCun, Bernhard Boser, John S Denker, Donnie Henderson, Richard E Howard, Wayne Hubbard, and Lawrence D Jackel. 1989. Backpropagation Applied to Handwritten Zip Code Recognition. Neural computation, Vol. 1, 4 (1989), 541--551.
[36]
Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How Good Are Query Optimizers, Really? PVLDB, Vol. 9, 3 (2015), 204--215.
[37]
Fei Li and HV Jagadish. 2014. Constructing an Interactive Natural Language Interface for Relational Databases. PVLDB, Vol. 8, 1 (2014), 73--84.
[38]
Jiexing Li, Arnd Christian König, Vivek Narasayya, and Surajit Chaudhuri. 2012. Robust Estimation of Resource Consumption for SQL Queries using Statistical Techniques. PVLDB, Vol. 5, 11 (2012), 1555--1566.
[39]
Henry Liu, Mingbin Xu, Ziting Yu, Vincent Corvinelli, and Calisto Zuzarte. 2015. Cardinality Estimation using Neural Networks. In CASCON. 53--59.
[40]
Hoang Vu Nguyen, Klemens Böhm, Florian Becker, Bertrand Goldman, Georg Hinkel, and Emmanuel Müller. 2015. Identifying User Interests within the Data Space-a Case Study with SkyServer. In EDBT. 641--652.
[41]
William O'Mullane, Nolan Li, Maria Nieto-Santisteban, Alex Szalay, Ani Thakar, and Jim Gray. 2005. Batch is Back: CasJobs, Serving Multi-TB Data on the Web. In ICWS. 33--40.
[42]
Terence Parr. 2013. The Definitive ANTLR 4 Reference. Pragmatic Bookshelf.
[43]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-Learn: Machine Learning in Python. JMLR, Vol. 12 (2011), 2825--2830.
[44]
M Jordan Raddick, Ani R Thakar, Alexander S Szalay, and Rafael DC Santos. 2014a. Ten Years of SkyServer I: Tracking Web and SQL e-Science Usage. Computing in Science & Engineering, Vol. 16, 4 (2014), 22--31.
[45]
M Jordan Raddick, Ani R Thakar, Alexander S Szalay, and Rafael DC Santos. 2014b. Ten Years of SkyServer II: How Astronomers and the Public Have Embraced e-Science. Computing in Science & Engineering, Vol. 16, 4 (2014), 32--40.
[46]
Yanyan Shen, Kaushik Chakrabarti, Surajit Chaudhuri, Bolin Ding, and Lev Novik. 2014. Discovering Queries Based on Example Tuples. In SIGMOD. 493--504.
[47]
Vik Singh, Jim Gray, Ani Thakar, Alexander S Szalay, Jordan Raddick, Bill Boroski, Svetlana Lebedeva, and Brian Yanny. 2007. Skyserver Traffic Report - The First Five Years. ArXiv, Vol. 0701173 (2007).
[48]
Kostas Stefanidis, Marina Drosou, and Evaggelia Pitoura. 2009. “You May Also Like” Results in Relational Databases. In PersDB. 37--42.
[49]
Alexander S Szalay. 2018. From SkyServer to SciServer. AAPSS, Vol. 675, 1 (2018), 202--220.
[50]
Alexander S Szalay, Jim Gray, Ani R Thakar, Peter Z Kunszt, Tanu Malik, Jordan Raddick, Christopher Stoughton, and Jan vandenBerg. 2002. The SDSS Skyserver: Public Access to the Sloan Digital Sky Server Data. In SIGMOD. 570--581.
[51]
Kai Sheng Tai, Richard Socher, and Christopher D Manning. 2015. Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. ArXiv, Vol. 1503.00075 (2015).
[52]
TPC. 2018. TPC Benchmarks. http://www.tpc.org.
[53]
Wei Wang, Meihui Zhang, Gang Chen, H. V. Jagadish, Beng Chin Ooi, and Kian-Lee Tan. 2016. Database Meets Deep Learning: Challenges and Opportunities. SIGMOD Rec., Vol. 45, 2 (2016), 17--22.
[54]
Wenpeng Yin, Katharina Kann, Mo Yu, and Hinrich Schütze. 2017. Comparative Study of CNN and RNN for Natural Language Processing. ArXiv, Vol. 1702.01923 (2017).
[55]
Jason Yosinski, Jeff Clune, Yoshua Bengio, and Hod Lipson. 2014. How Transferable Are Features in Deep Neural Networks?. In NIPS. 3320--3328.
[56]
Tom Young, Devamanyu Hazarika, Soujanya Poria, and Erik Cambria. 2018. Recent Trends in Deep Learning Based Natural Language Processing. IEEE Computational Intelligence Magazine, Vol. 13, 3 (2018), 55--75.
[57]
Wojciech Zaremba and Ilya Sutskever. 2014. Learning to Execute. ArXiv, Vol. 1410.4615 (2014).
[58]
Zhong Zeng, Zhifeng Bao, Thuy Ngoc Le, Mong Li Lee, and Wang Tok Ling. 2014. ExpressQ: Identifying Keyword Context and Search Target in Relational Keyword Queries. In CIKM. 31--40.
[59]
Xiang Zhang, Junbo Zhao, and Yann LeCun. 2015. Character-Level Convolutional Networks for Text Classification. In NIPS. 649--657.
[60]
Victor Zhong, Caiming Xiong, and Richard Socher. 2017. Seq2SQL: Generating Structured Queries From Natural Language using Reinforcement Learning. ArXiv, Vol. 1709.00103 (2017).
[61]
Zainab Zolaktaf, Mostafa Milani, and Rachel Pottinger. 2020. Facilitating SQL Query Composition and Analysis. ArXiv, Vol. 2002.09091 (2020).

Cited By

View all
  • (2024)ReJOOSp: Reinforcement Learning for Join Order Optimization in SPARQLBig Data and Cognitive Computing10.3390/bdcc80700718:7(71)Online publication date: 27-Jun-2024
  • (2024)Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generationJournal of Biomedical Informatics10.1016/j.jbi.2024.104649154(104649)Online publication date: Jun-2024
  • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '20: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data
June 2020
2925 pages
ISBN:9781450367356
DOI:10.1145/3318464
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 May 2020

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. machine learning
  2. query property prediction
  3. workload

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '20
Sponsor:

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)63
  • Downloads (Last 6 weeks)4
Reflects downloads up to 30 Aug 2024

Other Metrics

Citations

Cited By

View all
  • (2024)ReJOOSp: Reinforcement Learning for Join Order Optimization in SPARQLBig Data and Cognitive Computing10.3390/bdcc80700718:7(71)Online publication date: 27-Jun-2024
  • (2024)Criteria2Query 3.0: Leveraging generative large language models for clinical trial eligibility query generationJournal of Biomedical Informatics10.1016/j.jbi.2024.104649154(104649)Online publication date: Jun-2024
  • (2023)FASTune: Towards Fast and Stable Database Tuning System with Reinforcement LearningElectronics10.3390/electronics1210216812:10(2168)Online publication date: 10-May-2023
  • (2023)Using Machine Learning and Routing Protocols for Optimizing Distributed SPARQL Queries in CollaborationComputers10.3390/computers1210021012:10(210)Online publication date: 17-Oct-2023
  • (2023)Real-Time Workload Pattern Analysis for Large-Scale Cloud DatabasesProceedings of the VLDB Endowment10.14778/3611540.361155716:12(3689-3701)Online publication date: 1-Aug-2023
  • (2023)Human-in-the-Loop Evolution of Database Views for Data Exploration2023 IEEE Latin American Conference on Computational Intelligence (LA-CCI)10.1109/LA-CCI58595.2023.10409423(1-6)Online publication date: 29-Oct-2023
  • (2022)Intelligent Automated Workload Analysis for Database ReplatformingProceedings of the 2022 International Conference on Management of Data10.1145/3514221.3526050(2273-2285)Online publication date: 10-Jun-2022
  • (2022)Smarter Warehouse2022 IEEE 38th International Conference on Data Engineering Workshops (ICDEW)10.1109/ICDEW55742.2022.00005(1-8)Online publication date: May-2022
  • (2021)Efficient Deep Learning Pipelines for Accurate Cost Estimations Over Large Scale Query WorkloadProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457546(1014-1022)Online publication date: 9-Jun-2021
  • (2021)"What makes my queries slow?"Proceedings of the 36th IEEE/ACM International Conference on Automated Software Engineering10.1109/ASE51524.2021.9678915(642-652)Online publication date: 15-Nov-2021

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media