Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Learning Join Queries from User Examples

Published: 04 January 2016 Publication History

Abstract

We investigate the problem of learning join queries from user examples. The user is presented with a set of candidate tuples and is asked to label them as positive or negative examples, depending on whether or not she would like the tuples as part of the join result. The goal is to quickly infer an arbitrary n-ary join predicate across an arbitrary number m of relations while keeping the number of user interactions as minimal as possible. We assume no prior knowledge of the integrity constraints across the involved relations. Inferring the join predicate across multiple relations when the referential constraints are unknown may occur in several applications, such as data integration, reverse engineering of database queries, and schema inference. In such scenarios, the number of tuples involved in the join is typically large. We introduce a set of strategies that let us inspect the search space and aggressively prune what we call uninformative tuples, and we directly present to the user the informative ones—that is, those that allow the user to quickly find the goal query she has in mind. In this article, we focus on the inference of joins with equality predicates and also allow disjunctive join predicates and projection in the queries. We precisely characterize the frontier between tractability and intractability for the following problems of interest in these settings: consistency checking, learnability, and deciding the informativeness of a tuple. Next, we propose several strategies for presenting tuples to the user in a given order that allows minimization of the number of interactions. We show the efficiency of our approach through an experimental study on both benchmark and synthetic datasets.

References

[1]
S. Abiteboul, R. Hull, and V. Vianu. 1995. Foundations of Databases. Addison-Wesley.
[2]
A. Abouzied, D. Angluin, C. H. Papadimitriou, J. M. Hellerstein, and A. Silberschatz. 2013. Learning and verifying quantified Boolean queries by example. In Proceedings of the PODS Conference. 49--60.
[3]
A. Abouzied, J. M. Hellerstein, and A. Silberschatz. 2012. Playful query specification with DataPlay. Proceedings of the VLDB Endowment 5, 12, 1938--1941.
[4]
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. 2011a. Designing and refining schema mappings via data examples. In Proceedings of the SIGMOD Conference. 133--144.
[5]
B. Alexe, B. ten Cate, P. G. Kolaitis, and W. C. Tan. 2011b. EIRENE: Interactive design and refinement of schema mappings via data examples. Proceedings of the VLDB Endowment 4, 12, 1414--1417.
[6]
D. Angluin. 1988. Queries and concept learning. Machine Learning 2, 4, 319--342.
[7]
F. Bancilhon. 1978. On the completeness of query languages for relational data bases. In Proceedings of the MFCS Conference. 112--123.
[8]
G. J. Bex, W. Gelade, F. Neven, and S. Vansummeren. 2010. Learning deterministic regular expressions for the inference of schemas from XML data. ACM Transactions on the Web 4, 4, Article No. 14.
[9]
A. Bonifati, R. Ciucanu, and A. Lemay. 2015. Learning path queries on graph databases. In Proceedings of the EDBT Conference. 109--120.
[10]
A. Bonifati, R. Ciucanu, and S. Staworko. 2014a. Interactive inference of join queries. Proceedings of the EDBT Conference. 451--462.
[11]
A. Bonifati, R. Ciucanu, and S. Staworko. 2014b. Interactive join query inference with JIM. Proceedings of the VLDB Endowment 7, 13, 1541--1544.
[12]
S. Cohen and Y. Weiss. 2013. Certain and possible XPath answers. In Proceedings of the ICDT Conference. 237--248.
[13]
A. Das Sarma, A. Parameswaran, H. Garcia-Molina, and J. Widom. 2010. Synthesizing view definitions from data. In Proceedings of the ICDT Conference. 89--103.
[14]
W. Fan, F. Geerts, J. Li, and M. Xiong. 2011. Discovering conditional functional dependencies. IEEE Transactions on Knowledge and Data Engineering 23, 5, 683--698.
[15]
G. Fletcher, M. Gyssens, J. Paredaens, and D. Van Gucht. 2009. On the expressive power of the relational algebra on finite sets of relation pairs. IEEE Transactions on Knowledge and Data Engineering 21, 6, 939--942.
[16]
M. J. Franklin, D. Kossmann, T. Kraska, S. Ramesh, and R. Xin. 2011. CrowdDB: Answering queries with crowdsourcing. In Proceedings of the SIGMOD Conference. 61--72.
[17]
E. M. Gold. 1967. Language identification in the limit. Information and Control 10, 5, 447--474.
[18]
E. M. Gold. 1978. Complexity of automaton identification from given data. Information and Control 37, 3, 302--320.
[19]
G. Gottlob and P. Senellart. 2010. Schema mapping discovery from data instances. Journal of the ACM 57, 2, Article No. 6.
[20]
T. Imielinski and W. Lipski Jr. 1984. Incomplete information in relational databases. Journal of the ACM 31, 4, 761--791.
[21]
H. V. Jagadish, A. Chapman, A. Elkiss, M. Jayapandian, Y. Li, A. Nandi, and C. Yu. 2007. Making database systems usable. In Proceedings of the SIGMOD Conference. 13--24.
[22]
M. J. Kearns and U. V. Vazirani. 1994. An Introduction to Computational Learning Theory. MIT Press, Cambridge, MA.
[23]
A. Lemay, S. Maneth, and J. Niehren. 2010. A learning algorithm for top-down XML transformations. In Proceedings of the PODS Conference. 285--296.
[24]
A. Marcus, E. Wu, D. Karger, S. Madden, and R. Miller. 2011. Human-powered sorts and joins. Proceedings of the VLDB Endowment 5, 1, 13--24.
[25]
A. Nandi and H. V. Jagadish. 2011. Guided interaction: Rethinking the query-result paradigm. Proceedings of the VLDB Endowment 4, 12, 1466--1469.
[26]
J. Paredaens. 1978. On the expressive power of the relational algebra. Information Processing Letters 7, 2, 107--111.
[27]
L. Qian, M. J. Cafarella, and H. V. Jagadish. 2012. Sample-driven schema mapping. In Proceedings of the SIGMOD Conference. 73--84.
[28]
S. J. Russell and P. Norvig. 2010. Artificial Intelligence: A Modern Approach. Pearson Education.
[29]
T. Sellam and M. L. Kersten. 2013. Meet Charles, big data query advisor. In Proceedings of the CIDR Conference. 1--6.
[30]
S. Staworko and P. Wieczorek. 2012. Learning twig and path queries. In Proceedings of the ICDT Conference. 140--154.
[31]
B. ten Cate, V. Dalmau, and P. G. Kolaitis. 2013. Learning schema mappings. ACM Transactions on Database Systems 38, 4, 28.
[32]
Q. T. Tran, C.-Y. Chan, and S. Parthasarathy. 2009. Query by output. In Proceedings of the SIGMOD Conference. 535--548.
[33]
D. Van Gucht. 1987. On the expressive power of the extended relational algebra for the unnormalized relational model. In Proceedings of the PODS Conference. 302--312.
[34]
J. Wang, G. Li, T. Kraska, M. J. Franklin, and J. Feng. 2013. Leveraging transitive relations for crowdsourced joins. In Proceedings of the SIGMOD Conference. 229--240.
[35]
Z. Yan, N. Zheng, Z. G. Ives, P. P. Talukdar, and C. Yu. 2013. Actively soliciting feedback for query answers in keyword search-based data integration. Proceedings of the VLDB Endowment 6, 3, 205--216.
[36]
M. Zhang, H. Elmeleegy, C. M. Procopiuc, and D. Srivastava. 2013. Reverse engineering complex join queries. In Proceedings of the SIGMOD Conference. 809--820.
[37]
M. M. Zloof. 1975. Query by example. In Proceedings of the AFIPS Conference. 431--438.

Cited By

View all
  • (2023)Searching for explanations of black-box classifiers in the space of semantic queriesSemantic Web10.3233/SW-233469(1-42)Online publication date: 2-Aug-2023
  • (2023)Metam: Goal-Oriented Data Discovery2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00213(2780-2793)Online publication date: Apr-2023
  • (2023)Ver: View Discovery in the Wild2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00045(503-516)Online publication date: Apr-2023
  • Show More Cited By

Index Terms

  1. Learning Join Queries from User Examples

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Database Systems
    ACM Transactions on Database Systems  Volume 40, Issue 4
    Special Issue: Invited 2014 PODS and EDBT Revised Articles
    February 2016
    248 pages
    ISSN:0362-5915
    EISSN:1557-4644
    DOI:10.1145/2866579
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 04 January 2016
    Accepted: 01 August 2015
    Revised: 01 June 2015
    Received: 01 December 2014
    Published in TODS Volume 40, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. SQL query discovery
    2. incomplete schema
    3. reverse engineering

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)17
    • Downloads (Last 6 weeks)3
    Reflects downloads up to 30 Aug 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)Searching for explanations of black-box classifiers in the space of semantic queriesSemantic Web10.3233/SW-233469(1-42)Online publication date: 2-Aug-2023
    • (2023)Metam: Goal-Oriented Data Discovery2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00213(2780-2793)Online publication date: Apr-2023
    • (2023)Ver: View Discovery in the Wild2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00045(503-516)Online publication date: Apr-2023
    • (2022)On the Parameterized Complexity of Learning First-Order LogicProceedings of the 41st ACM SIGMOD-SIGACT-SIGAI Symposium on Principles of Database Systems10.1145/3517804.3524151(337-346)Online publication date: 12-Jun-2022
    • (2022)Example-based Spatial Search at Scale2022 IEEE 38th International Conference on Data Engineering (ICDE)10.1109/ICDE53745.2022.00045(539-551)Online publication date: May-2022
    • (2022)KGIQ: Scalable Translation of User-Specified Examples into Knowledge-Graph Queries2022 IEEE International Conference on Big Data (Big Data)10.1109/BigData55660.2022.10021081(3909-3918)Online publication date: 17-Dec-2022
    • (2022)Schema mapping generation in the wildInformation Systems10.1016/j.is.2021.101904104:COnline publication date: 1-Feb-2022
    • (2021)DICEProceedings of the VLDB Endowment10.14778/3476311.347635314:12(2819-2822)Online publication date: 28-Oct-2021
    • (2021)Query Definability and Its Approximations in Ontology-based Data ManagementProceedings of the 30th ACM International Conference on Information & Knowledge Management10.1145/3459637.3482466(271-280)Online publication date: 26-Oct-2021
    • (2021)Scalable and Usable Relational Learning With Automatic Language BiasProceedings of the 2021 International Conference on Management of Data10.1145/3448016.3457275(1440-1451)Online publication date: 9-Jun-2021
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media