Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3183713.3183727acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

FastQRE: Fast Query Reverse Engineering

Published: 27 May 2018 Publication History

Abstract

We study the problem of Query Reverse Engineering (QRE), where given a database and an output table, the task is to find a simple project-join SQL query that generates that table when applied on the database. This problem is known for its efficiency challenge due to mainly two reasons. First, the problem has a very large search space and its various variants are known to be NP-hard. Second, executing even a single candidate SQL query can be very computationally expensive. In this work we propose a novel approach for solving the QRE problem efficiently. Our solution outperforms the existing state of the art by 2-3 orders of magnitude for complex queries, resolving those queries in seconds rather than days, thus making our approach more practical in real-life settings.

References

[1]
R. Agrawal, T. Imielinski, and A. Swami. Mining association rules between sets of items in large databases. In SIGMOD, 1993.
[2]
S. Agrawal, S. Chaudhuri, and G. Das. Dbxplorer: A system for keyword-based search over relational databases. In ICDE, 2002.
[3]
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. Designing and refining schema mappings via data examples. In SIGMOD, 2011.
[4]
G. Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S. Sudarshan. Keyword searching and browsing in databases using banks. In ICDE, 2002.
[5]
A. Bonifati, R. Ciucanu, and S. Staworko. Interactive inference of join queries. In EDBT, 2014.
[6]
A. Bonifati, R. Ciucanu, and S. Staworko. Learning join queries from user examples. ACM TODS, 40(4), 2016.
[7]
B. B. Dalvi, M. Kshirsagar, and S. Sudarshan. Keyword search on external memory data graphs. PVLDB, 1(1), 2008.
[8]
A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. Synthesizing view definitions from data. In ICDT, 2010.
[9]
T. Dasu, T. Johnson, S. Muthukrishnan, and V. Shkapenyuk. Mining database structure; or, how to build a data quality browser. In SIGMOD, 2002.
[10]
G. J. Fakas, Z. Cai, and N. Mamoulis. Size-l object summaries for relational keyword search. PVLDB, 5(3), 2011.
[11]
G. Gottlob and P. Senellart. Schema mapping discovery from data instances. J. ACM, 57(2), 2010.
[12]
H. He, H. Wang, J. Yang, and P. S. Yu. Blinks: Ranked keyword searches on graphs. In SIGMOD, 2007.
[13]
V. Hristidis, H. Hwang, and Y. Papakonstantinou. Authority-based keyword search in databases. TODS, 33(1), 2008.
[14]
V. Hristidis and Y. Papakonstantinou. Discover: Keyword search in relational databases. In VLDB, 2002.
[15]
H. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. Making database systems usable. SIGMOD, 2007.
[16]
M. Jayapandian and H. V. Jagadish. Automated creation of a forms-based database query interface. PVLDB, 1(1), 2008.
[17]
G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease: An effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD, 2008.
[18]
H. Li, C. Chan, and D. Maier. Query from examples: An iterative, data-driven approach to query construction. PVLDB, 8(13), 2015.
[19]
A. Meliou, W. Gatterbauer, and D. Suciu. Reverse data management. PVLDB, 4(12), 2011.
[20]
Microsoft Research. Data generator. ftp://ftp.research.microsoft.com/users/viveknar/TPCDSkew/.
[21]
R. Miller, L. Haas, and M. Hernandez. Schema mapping as query discovery. In VLDB, 1999.
[22]
K. Panev and S. Michel. Reverse engineering top-k database queries with PALEO. In EDBT, 2016.
[23]
T. Papenbrock and F. Naumann. A hybrid approach to functional dependency discovery. In SIGMOD, 2016.
[24]
L. Qian, M. J. Cafarella, and H. V. Jagadish. Sample-driven schema mapping. In SIGMOD, 2012.
[25]
L. Qin, J. X. Yu, and L. Chang. Keyword search in databases: The power of rdbms. In SIGMOD, 2009.
[26]
L. Qin, J. X. Yu, L. Chang, and Y. Tao. Querying communities in relational databases. In ICDE, 2009.
[27]
Y. Shen, K. Chakrabarti, S. Chaudhuri, B. Ding, and L. Novik. Discovering queries based on example tuples. In SIGMOD, 2014.
[28]
W. C. Tan, M. Zhang, H. Elmeleegy, and D. Srivastava. Reverse engineering aggregation queries. In VLDB, 2017.
[29]
TPC. TPC benchmarks. http://www.tpc.org/.
[30]
Q. T. Tran, C. Chan, and S. Parthasarathy. Query by output. In SIGMOD, 2009.
[31]
Q. T. Tran, C. Y. Chan, and S. Parthasarathy. Query reverse engineering. VLDB J., 23(5), 2014.
[32]
C. Wang, A. Cheung, and R. Bodík. Synthesizing highly expressive SQL queries from input-output examples. In PLDI, 2017.
[33]
Y. Y. Weiss and S. Cohen. Reverse engineering spj-queries from examples. In PODS, 2017.
[34]
D. B. West. Introduction to Graph Theory. Prentice Hall, 2 edition, 2000.
[35]
X. Yang, C. M. Procopiuc, and D. Srivastava. Summary graphs for relational database schemas. PVLDB, 4(11), 2011.
[36]
C. Yu and H. V. Jagadish. Schema summarization. In VLDB, 2006.
[37]
C. Yu and H. V. Jagadish. Querying complex structured databases. VLDB, 2007.
[38]
M. Zhang, H. Elmeleegy, C. Procopiuc, and D. Srivastava. Reverse engineering complex join queries. In SIGMOD, 2013.
[39]
M. Zhang, M. Hadjieleftheriou, B. C. Ooi, C. M. Procopiuc, and D. Srivastava. On multi-column foreign key discovery. PVLDB, 3(1), 2010.
[40]
M. Zhang, M. Hadjieleftheriou, B. C. Ooi, C. M. Procopiuc, and D. Srivastava. Automatic discovery of attributes in relational databases. In SIGMOD, 2011.

Cited By

View all
  • (2024)On Reasoning About Black-Box Udfs by Classifying their Performance CharacteristicsProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.83Online publication date: 2024
  • (2024)Separability and Its Approximations in Ontology-based Data ManagementSemantic Web10.3233/SW-23339115:4(1021-1056)Online publication date: 4-Oct-2024
  • (2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
SIGMOD '18: Proceedings of the 2018 International Conference on Management of Data
May 2018
1874 pages
ISBN:9781450347037
DOI:10.1145/3183713
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 May 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. automated data lineage discovery
  2. cgm
  3. column coherence

Qualifiers

  • Research-article

Conference

SIGMOD/PODS '18
Sponsor:

Acceptance Rates

SIGMOD '18 Paper Acceptance Rate 90 of 461 submissions, 20%;
Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)23
  • Downloads (Last 6 weeks)5
Reflects downloads up to 03 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2024)On Reasoning About Black-Box Udfs by Classifying their Performance CharacteristicsProceedings of the 32nd International Conference on Information Systems Development10.62036/ISD.2024.83Online publication date: 2024
  • (2024)Separability and Its Approximations in Ontology-based Data ManagementSemantic Web10.3233/SW-23339115:4(1021-1056)Online publication date: 4-Oct-2024
  • (2024)Gen-T: Table Reclamation in Data Lakes2024 IEEE 40th International Conference on Data Engineering (ICDE)10.1109/ICDE60146.2024.00272(3532-3545)Online publication date: 13-May-2024
  • (2024)Solving Why Not Questions for Aggregate Constraints Through Query Repair2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)10.1109/EuroSPW61312.2024.00072(592-596)Online publication date: 8-Jul-2024
  • (2023)Reverse engineering of temporal queries mediated by LTL ontologiesProceedings of the Thirty-Second International Joint Conference on Artificial Intelligence10.24963/ijcai.2023/360(3230-3238)Online publication date: 19-Aug-2023
  • (2023)Hybrid Query and Instance Explanations and RepairsCompanion Proceedings of the ACM Web Conference 202310.1145/3543873.3587565(1559-1562)Online publication date: 30-Apr-2023
  • (2023)SQL Synthesis with Input-Output Example Based on Deep Learning2023 International Joint Conference on Neural Networks (IJCNN)10.1109/IJCNN54540.2023.10191168(1-8)Online publication date: 18-Jun-2023
  • (2023)Ver: View Discovery in the Wild2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00045(503-516)Online publication date: Apr-2023
  • (2023)Personalized Rumor Refutation Through Graph Regular Pattern2023 IEEE 8th International Conference on Big Data Analytics (ICBDA)10.1109/ICBDA57405.2023.10104992(165-174)Online publication date: 3-Mar-2023
  • (2022)Example-based Spatial Search at Scale2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00045(539-551)Online publication date: May-2022
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media