Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3517912acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections
research-article

Computing the Shapley Value of Facts in Query Answering

Published: 11 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley values that adopts tools from probabilistic query evaluation. In particular, we capture the dependence of query answers on input database facts using Boolean expressions (data provenance), and then transform it, via Knowledge Compilation, into a particular circuit form for which we devise an algorithm for computing the Shapley values. Our second practical solution is a faster yet inexact approach that transforms the provenance to a Conjunctive Normal Form and uses a heuristic to compute the Shapley values. Our experiments on TPC-H and IMDB demonstrate the practical effectiveness of our solutions.

    References

    [1]
    Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. http://webdam.inria.fr/Alice/
    [2]
    Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 a. On the complexity of SHAP-score-based explanations: Tractability via knowledge compilation and non-approximability results. arXiv preprint (2021). https://arxiv.org/abs/2104.08015
    [3]
    Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 b. The tractability of SHAP-score-based explanations over deterministic and decomposable Boolean circuits. In Proceedings of AAAI. https://arxiv.org/abs/2007.14045
    [4]
    Peter Buneman, James Cheney, Wang-Chiew Tan, and Stijn Vansummeren. 2008. Curated databases. In Proceedings of PODS. 1--12. https://homepages.inf.ed.ac.uk/opb/papers/inv.pdf
    [5]
    Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In ICDT. Springer, 316--330. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1209&context=cis_papers
    [6]
    Yingwei Cui, Jennifer Widom, and Janet L Wiener. 2000. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems (TODS), Vol. 25, 2 (2000), 179--227. http://ilpubs.stanford.edu:8090/252/1/1997--3.pdf
    [7]
    Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J., Vol. 16, 4 (2007), 523--544. https://homes.cs.washington.edu/ suciu/vldbj-probdb.pdf
    [8]
    Nilesh Dalvi and Dan Suciu. 2013. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), Vol. 59, 6 (2013), 1--87. https://homes.cs.washington.edu/ suciu/jacm-dichotomy.pdf
    [9]
    Adnan Darwiche. 2001. On the tractable counting of theory models and its application to truth maintenance and belief revision. J. Applied Non-Classical Logics, Vol. 11, 1--2 (2001). https://arxiv.org/abs/cs/0003044
    [10]
    Adnan Darwiche. 2004. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI. Citeseer, 328--332. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.2262
    [11]
    Adnan Darwiche and Pierre Marquis. 2002. A knowledge compilation map. Journal of Artificial Intelligence Research, Vol. 17 (2002), 229--264. https://arxiv.org/abs/1106.1819
    [12]
    Daniel Deutch, Nave Frost, and Amir Gilad. 2020. Explaining natural language query results. The VLDB Journal, Vol. 29, 1 (2020), 485--508. https://arxiv.org/abs/2007.04454
    [13]
    Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaël Monet. 2021. Shapley for database facts source code. https://github.com/navefr/ShapleyForDbFacts .
    [14]
    Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of PODS. 31--40. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research
    [15]
    Todd J Green and Val Tannen. 2017. The semiring framework for database provenance. In Proceedings of PODS. 93--99. https://dl.acm.org/doi/10.1145/3034786.3056125
    [16]
    Anthony Hunter and Sébastien Konieczny. 2010. On the measure of conflicts: Shapley inconsistency values. Artificial Intelligence, Vol. 174, 14 (2010), 1007--1026. http://www.cril.univ-artois.fr/ konieczny/papers/aij10a.pdf
    [17]
    Tomasz Imielinski and Witold Lipski Jr. 1984. Incomplete Information in Relational Databases. J. ACM, Vol. 31, 4 (1984), 761--791. https://doi.org/10.1145/1634.1886
    [18]
    Abhay Jha and Dan Suciu. 2013. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory of Computing Systems, Vol. 52, 3 (2013), 403--440. https://link.springer.com/article/10.1007/s00224-012--9392--5
    [19]
    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215. https://www.vldb.org/pvldb/vol9/p204-leis.pdf
    [20]
    Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley value of tuples in query answering. In ICDT, Vol. 155. Schloss Dagstuhl, 20:1--20:19. https://arxiv.org/abs/1904.08679
    [21]
    Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, Vol. 2, 1 (2020), 2522--5839. https://arxiv.org/pdf/1905.04610.pdf
    [22]
    Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
    [23]
    Irwin Mann and LS Shapley. 1960. Values for large games, IV: Evaluating the Electoral College by Monte Carlo Techniques. The Rand Corporation. Research Memorandum, Vol. 2651 (1960). https://www.rand.org/pubs/research_memoranda/RM2651.html
    [24]
    Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. PVLDB, Vol. 4, 1 (2010), 34--45. https://www.vldb.org/pvldb/vol4/p34-meliou.pdf
    [25]
    Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and explanations in databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 7, 13 (2014), 1715--1716. http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf
    [26]
    Mikaël Monet. 2020. Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases. In Proceedings of PODS. 149--163. https://arxiv.org/abs/1912.11864
    [27]
    Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The impact of negation on the complexity of the Shapley value in conjunctive queries. In Proceedings of PODS. 285--297. https://arxiv.org/abs/1912.12610
    [28]
    Alvin E Roth. 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley .Cambridge University Press. http://www.library.fa.ru/files/Roth2.pdf
    [29]
    Sudeepa Roy, Laurel J. Orr, and Dan Suciu. 2015. Explaining query answers with explanation-ready databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 9, 4 (2015), 348--359. http://www.vldb.org/pvldb/vol9/p348-roy.pdf
    [30]
    Babak Salimi, Leopoldo E. Bertossi, Dan Suciu, and Guy Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In TaPP. USENIX Association. http://web.cs.ucla.edu/ guyvdb/papers/SalimiTaPP16.pdf
    [31]
    Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. Provsql: Provenance and probability management in postgresql. Proceedings of the VLDB Endowment (PVLDB), Vol. 11, 12 (2018), 2034--2037. https://hal.inria.fr/hal-01851538/file/p976-senellart.pdf
    [32]
    Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317. http://www.library.fa.ru/files/Roth2.pdf#page=39
    [33]
    Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases .Morgan & Claypool. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016
    [34]
    Transaction Processing Performance Council (TPC). 2017. hrefhttp://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdfTPC-H benchmark. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf
    [35]
    Grigori S Tseitin. 1983. On the complexity of derivation in propositional calculus. In Automation of reasoning. Springer, 466--483. https://link.springer.com/chapter/10.1007/978--3--642--81955--1_28
    [36]
    Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2021. On the tractability of shap explanations. In Proceedings of AAAI. https://arxiv.org/abs/2009.08634
    [37]
    Moshe Y Vardi. 1982. The complexity of relational query languages. In STOC. ACM, 137--146. http://www.dis.uniroma1.it/ degiacom/didattica/semingsoft/SIS05-06/materiale/1-query-congiuntive/riferimenti/vardi-1982.pdf
    [38]
    Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG ranking measures. In Proceedings of COLT, Vol. 8. 6. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.490&rep=rep1&type=pdf
    [39]
    Bruno Yun, Srdjan Vesic, Madalina Croitoru, and Pierre Bisquert. 2018. Inconsistency Measures for Repair Semantics in OBDA. In IJCAI. ijcai.org, 1977--1983. https://www.ijcai.org/proceedings/2018/0273.pdf

    Cited By

    View all
    • (2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
    • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
    • (2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
    June 2022
    2597 pages
    ISBN:9781450392495
    DOI:10.1145/3514221
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. knowledge compilation
    2. provenance
    3. shapley value

    Qualifiers

    • Research-article

    Funding Sources

    Conference

    SIGMOD/PODS '22
    Sponsor:

    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)167
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
    • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
    • (2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
    • (2024)When is Shapley Value Computation a Matter of Counting?Proceedings of the ACM on Management of Data10.1145/36516062:2(1-24)Online publication date: 14-May-2024
    • (2024)From Shapley Value to Model Counting and BackProceedings of the ACM on Management of Data10.1145/36511422:2(1-23)Online publication date: 14-May-2024
    • (2024)Applications and Computation of the Shapley Value in Databases and Machine LearningCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654680(630-635)Online publication date: 9-Jun-2024
    • (2023)The Shapley Value in Database ManagementACM SIGMOD Record10.1145/3615952.361595452:2(6-17)Online publication date: 11-Aug-2023
    • (2023)Efficient Sampling Approaches to Shapley Value ApproximationProceedings of the ACM on Management of Data10.1145/35887281:1(1-24)Online publication date: 30-May-2023
    • (2023)On Explaining Confounding Bias2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00144(1846-1859)Online publication date: Apr-2023
    • (2023)Dynamic Shapley Value Computation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00055(639-652)Online publication date: Apr-2023

    View Options

    Get Access

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media