Abstract
Explaining why an answer is (or is not) returned by a query is important for many applications including auditing, debugging data and queries, and answering hypothetical questions about data. In this work, we present the first practical approach for answering such questions for queries with negation (first-order queries). Specifically, we introduce a graph-based provenance model that, while syntactic in nature, supports reverse reasoning and is proven to encode a wide range of provenance models from the literature. The implementation of this model in our PUG (Provenance Unification through Graphs) system takes a provenance question and Datalog query as an input and generates a Datalog program that computes an explanation, i.e., the part of the provenance that is relevant to answer the question. Furthermore, we demonstrate how a desirable factorization of provenance can be achieved by rewriting an input query. We experimentally evaluate our approach demonstrating its efficiency.
Similar content being viewed by others
Notes
or, equivalently, queries in full relational algebra (without aggregation), formulas in FO logic under the closed-world assumption, and SPJUD-queries (select, project, join, union, difference).
This follows from the semantics of the type of 2-player game used here. The details are beyond the scope of this paper.
We only restrict the discussion to sentences for simplicity. The arguments here also hold for formulas with free variables.
In [34], relational algebra is used to express queries and nodes of f-trees represent equivalence classes of attributes which in Datalog correspond to query variables.
References
Arab, B., Gawlick, D., Radhakrishnan, V., Guo, H., Glavic, B.: A generic provenance middleware for database queries, updates, and transactions. In: TaPP (2014)
Bidoit, N., Herschel, M., Tzompanaki, K.: Immutably answering why-not questions for equivalent conjunctive queries. In: TaPP (2014)
Bidoit, N., Herschel, M., Tzompanaki, K., et al.: Query-based why-not provenance with NedExplain. In: EDBT, pp. 145–156 (2014)
Chapman, A., Jagadish, H.V.: Why not? In: SIGMOD, pp. 523–534 (2009)
Cheney, J., Chiticariu, L., Tan, W.: Provenance in databases: why, how, and where. Found. Trends Databases 1(4), 379–474 (2009)
Damásio, C.V., Analyti, A., Antoniou, G.: Justifications for logic programming. In: Logic Programming and Nonmonotonic Reasoning, pp. 530–542 (2013)
Deutch, D., Gilad, A., Moskovitch, Y.: Selective provenance for datalog programs using top-k queries. PVLDB 8(12), 1394–1405 (2015)
Deutch, D., Milo, T., Roy, S., Tannen, V.: Circuits for datalog provenance. In: ICDT, pp. 201–212 (2014)
Fehrenbach, S., Cheney, J.: Language-integrated provenance. Sci. Comput. Programm. 155, 103–145 (2017)
Flum, J., Kubierschky, M., Ludäscher, B.: Total and partial well-founded datalog coincide. In: ICDT, pp. 113–124 (1997)
Glavic, B., Köhler, S., Riddle, S., Ludäscher, B.: Towards constraint-based explanations for answers and non-answers. In: TaPP (2015)
Glavic, B., Miller, R.J., Alonso, G.: Using sql for efficient generation and querying of provenance information. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) In Search of Elegance in the Theory and Practice of Computation, pp. 291–320. Springer, Berlin (2013)
Grädel, E., Tannen, V.: Semiring provenance for first-order model checking (2017). arXiv:1712.01980
Green, T.: Containment of conjunctive queries on annotated relations. Theory Comput. Syst. 49(2), 429–459 (2011)
Green, T., Karvounarakis, G., Tannen, V.: Provenance semirings. In: PODS, pp. 31–40 (2007)
Green, T.J., Aref, M., Karvounarakis, G.: Logicblox, platform and language: a tutorial. In: Datalog in Academia and Industry, pp. 1–8. Springer, Berlin (2012)
Green, T.J., Karvounarakis, G., Ives, Z.G., Tannen, V.: Update exchange with mappings and provenance. In: VLDB, pp. 675–686 (2007)
Green, T.J., Tannen, V.: The semiring framework for database provenance. In: PODS, pp. 93–99 (2017)
Herschel, M., Diestelkämper, R., Lahmar, H.B.: A survey on provenance: What for? what form? what from? VLDB J 9(3), 1–26 (2017)
Herschel, M., Hernandez, M.: Explaining missing answers to SPJUA queries. PVLDB 3(1), 185–196 (2010)
Huang, J., Chen, T., Doan, A., Naughton, J.: On the provenance of non-answers to queries over extracted data. In: VLDB, pp. 736–747 (2008)
Karvounarakis, G., Green, T.J.: Semiring-annotated data: queries and provenance. SIGMOD Rec. 41(3), 5–14 (2012)
Köhler, S., Ludäscher, B., Smaragdakis, Y.: Declarative datalog debugging for mere mortals. In: Datalog 2.0: Datalog in Academia and Industry, pp. 111–122 (2012)
Köhler, S., Ludäscher, B., Zinn, D.: First-order provenance games. In: Tannen, V., Wong, L., Libkin, L., Fan, W., Tan, W.C., Fourman, M. (eds.) Search of Elegance in the Theory and Practice of Computation, pp. 382–399. Springer, Berlin (2013)
Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: Efficiently computing provenance graphs for queries with negation. Technical Report CoRR (2016). arXiv:1701.05699
Lee, S., Köhler, S., Ludäscher, B., Glavic, B.: A SQL-middleware unifying why and why-not provenance for first-order queries. In: ICDE, pp. 485–496 (2017)
Lee, S., Ludäscher, B., Glavic, B.: Pug: A framework and practical implementation for why and why-not provenance (extended version). Technical Report CoRR (2018). arXiv:1808.05752
Lee, S., Niu, X., Ludäscher, B., Glavic, B.: Integrating approximate summarization with provenance capture. In: TaPP (2017)
Meliou, A., Gatterbauer, W., Moore, K., Suciu, D.: The complexity of causality and responsibility for query answers and non-answers. PVLDB 4(1), 34–45 (2010)
Meliou, A., Gatterbauer, W., Suciu, D.: Reverse data management. PVLDB 4(12), 1490–1493 (2011)
Meliou, A., Suciu, D.: Tiresias: The database oracle for how-to queries. In: SIGMOD, pp. 337–348 (2012)
Niu, X., Kapoor, R., Glavic, B., Gawlick, D., Liu, Z.H., Krishnaswamy, V., Radhakrishnan, V.: Provenance-aware query optimization. In: ICDE, pp. 473–484 (2017)
Olteanu, D., Závodnỳ, J.: Factorised representations of query results: size bounds and readability. In: ICDT, pp. 285–298. ACM (2012)
Olteanu, D., Závodnỳ, J.: Size bounds for factorised representations of query results. ACM Trans. Database Syst. (TODS) 40(1), 2 (2015)
Riddle, S., Köhler, S., Ludäscher, B.: Towards constraint provenance games. In: TaPP (2014)
Roy, S., Orr, L., Suciu, D.: Explaining query answers with explanation-ready databases. Proc. VLDB Endow. 9(4), 348–359 (2015)
Roy, S., Suciu, D.: A formal approach to finding explanations for database queries. In: SIGMOD (2014)
Senellart, P.: Provenance and probabilities in relational databases. ACM SIGMOD Rec. 46(4), 5–15 (2018)
Tannen, V.: Provenance analysis for FOL model checking. ACM SIGLOG News 4(1), 24–36 (2017)
Tran, Q.T., Chan, C.-Y.: How to conquer why-not questions. In: SIGMOD, pp. 15–26 (2010)
Wu, E., Madden, S.: Scorpion: explaining away outliers in aggregate queries. PVLDB 6(8), 553–564 (2013)
Wu, Y., Zhao, M., Haeberlen, A., Zhou, W., Loo, B.T.: Diagnosing missing events in distributed systems with negative provenance. In: SIGCOMM, pp. 383–394 (2014)
Xu, J., Zhang, W., Alawini, A., Tannen, V.: Provenance analysis for missing answers and integrity repairs. IEEE Data Eng. Bull. 41(1), 39–50 (2018)
Zhou, W., Sherr, M., Tao, T., Li, X., Loo, B.T., Mao, Y.: Efficient querying and maintenance of network provenance at internet-scale. In: SIGMOD, pp. 615–626 (2010)
Acknowledgements
This work was supported by NSF Awards OAC-{1640864, 1541450} and SMA-1637155. Opinions and findings expressed in this material are those of the authors and do not necessarily reflect the views of the NSF.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Lee, S., Ludäscher, B. & Glavic, B. PUG: a framework and practical implementation for why and why-not provenance. The VLDB Journal 28, 47–71 (2019). https://doi.org/10.1007/s00778-018-0518-5
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00778-018-0518-5