Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3514221.3517912acmconferencesArticle/Chapter ViewAbstractPublication PagesmodConference Proceedingsconference-collections

Computing the Shapley Value of Facts in Query Answering

Published: 11 June 2022 Publication History
  • Get Citation Alerts
  • Abstract

    The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley values that adopts tools from probabilistic query evaluation. In particular, we capture the dependence of query answers on input database facts using Boolean expressions (data provenance), and then transform it, via Knowledge Compilation, into a particular circuit form for which we devise an algorithm for computing the Shapley values. Our second practical solution is a faster yet inexact approach that transforms the provenance to a Conjunctive Normal Form and uses a heuristic to compute the Shapley values. Our experiments on TPC-H and IMDB demonstrate the practical effectiveness of our solutions.


    Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. http://webdam.inria.fr/Alice/
    Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 a. On the complexity of SHAP-score-based explanations: Tractability via knowledge compilation and non-approximability results. arXiv preprint (2021). https://arxiv.org/abs/2104.08015
    Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 b. The tractability of SHAP-score-based explanations over deterministic and decomposable Boolean circuits. In Proceedings of AAAI. https://arxiv.org/abs/2007.14045
    Peter Buneman, James Cheney, Wang-Chiew Tan, and Stijn Vansummeren. 2008. Curated databases. In Proceedings of PODS. 1--12. https://homepages.inf.ed.ac.uk/opb/papers/inv.pdf
    Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In ICDT. Springer, 316--330. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1209&context=cis_papers
    Yingwei Cui, Jennifer Widom, and Janet L Wiener. 2000. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems (TODS), Vol. 25, 2 (2000), 179--227. http://ilpubs.stanford.edu:8090/252/1/1997--3.pdf
    Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J., Vol. 16, 4 (2007), 523--544. https://homes.cs.washington.edu/ suciu/vldbj-probdb.pdf
    Nilesh Dalvi and Dan Suciu. 2013. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), Vol. 59, 6 (2013), 1--87. https://homes.cs.washington.edu/ suciu/jacm-dichotomy.pdf
    Adnan Darwiche. 2001. On the tractable counting of theory models and its application to truth maintenance and belief revision. J. Applied Non-Classical Logics, Vol. 11, 1--2 (2001). https://arxiv.org/abs/cs/0003044
    Adnan Darwiche. 2004. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI. Citeseer, 328--332. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=
    Adnan Darwiche and Pierre Marquis. 2002. A knowledge compilation map. Journal of Artificial Intelligence Research, Vol. 17 (2002), 229--264. https://arxiv.org/abs/1106.1819
    Daniel Deutch, Nave Frost, and Amir Gilad. 2020. Explaining natural language query results. The VLDB Journal, Vol. 29, 1 (2020), 485--508. https://arxiv.org/abs/2007.04454
    Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaël Monet. 2021. Shapley for database facts source code. https://github.com/navefr/ShapleyForDbFacts .
    Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of PODS. 31--40. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research
    Todd J Green and Val Tannen. 2017. The semiring framework for database provenance. In Proceedings of PODS. 93--99. https://dl.acm.org/doi/10.1145/3034786.3056125
    Anthony Hunter and Sébastien Konieczny. 2010. On the measure of conflicts: Shapley inconsistency values. Artificial Intelligence, Vol. 174, 14 (2010), 1007--1026. http://www.cril.univ-artois.fr/ konieczny/papers/aij10a.pdf
    Tomasz Imielinski and Witold Lipski Jr. 1984. Incomplete Information in Relational Databases. J. ACM, Vol. 31, 4 (1984), 761--791. https://doi.org/10.1145/1634.1886
    Abhay Jha and Dan Suciu. 2013. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory of Computing Systems, Vol. 52, 3 (2013), 403--440. https://link.springer.com/article/10.1007/s00224-012--9392--5
    Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215. https://www.vldb.org/pvldb/vol9/p204-leis.pdf
    Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley value of tuples in query answering. In ICDT, Vol. 155. Schloss Dagstuhl, 20:1--20:19. https://arxiv.org/abs/1904.08679
    Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, Vol. 2, 1 (2020), 2522--5839. https://arxiv.org/pdf/1905.04610.pdf
    Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
    Irwin Mann and LS Shapley. 1960. Values for large games, IV: Evaluating the Electoral College by Monte Carlo Techniques. The Rand Corporation. Research Memorandum, Vol. 2651 (1960). https://www.rand.org/pubs/research_memoranda/RM2651.html
    Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. PVLDB, Vol. 4, 1 (2010), 34--45. https://www.vldb.org/pvldb/vol4/p34-meliou.pdf
    Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and explanations in databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 7, 13 (2014), 1715--1716. http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf
    Mikaël Monet. 2020. Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases. In Proceedings of PODS. 149--163. https://arxiv.org/abs/1912.11864
    Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The impact of negation on the complexity of the Shapley value in conjunctive queries. In Proceedings of PODS. 285--297. https://arxiv.org/abs/1912.12610
    Alvin E Roth. 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley .Cambridge University Press. http://www.library.fa.ru/files/Roth2.pdf
    Sudeepa Roy, Laurel J. Orr, and Dan Suciu. 2015. Explaining query answers with explanation-ready databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 9, 4 (2015), 348--359. http://www.vldb.org/pvldb/vol9/p348-roy.pdf
    Babak Salimi, Leopoldo E. Bertossi, Dan Suciu, and Guy Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In TaPP. USENIX Association. http://web.cs.ucla.edu/ guyvdb/papers/SalimiTaPP16.pdf
    Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. Provsql: Provenance and probability management in postgresql. Proceedings of the VLDB Endowment (PVLDB), Vol. 11, 12 (2018), 2034--2037. https://hal.inria.fr/hal-01851538/file/p976-senellart.pdf
    Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317. http://www.library.fa.ru/files/Roth2.pdf#page=39
    Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases .Morgan & Claypool. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016
    Transaction Processing Performance Council (TPC). 2017. hrefhttp://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdfTPC-H benchmark. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf
    Grigori S Tseitin. 1983. On the complexity of derivation in propositional calculus. In Automation of reasoning. Springer, 466--483. https://link.springer.com/chapter/10.1007/978--3--642--81955--1_28
    Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2021. On the tractability of shap explanations. In Proceedings of AAAI. https://arxiv.org/abs/2009.08634
    Moshe Y Vardi. 1982. The complexity of relational query languages. In STOC. ACM, 137--146. http://www.dis.uniroma1.it/ degiacom/didattica/semingsoft/SIS05-06/materiale/1-query-congiuntive/riferimenti/vardi-1982.pdf
    Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG ranking measures. In Proceedings of COLT, Vol. 8. 6. https://citeseerx.ist.psu.edu/viewdoc/download?doi=
    Bruno Yun, Srdjan Vesic, Madalina Croitoru, and Pierre Bisquert. 2018. Inconsistency Measures for Repair Semantics in OBDA. In IJCAI. ijcai.org, 1977--1983. https://www.ijcai.org/proceedings/2018/0273.pdf

    Cited By

    View all
    • (2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
    • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
    • (2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
    • Show More Cited By



    Information & Contributors


    Published In

    cover image ACM Conferences
    SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data
    June 2022
    2597 pages
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]



    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 June 2022


    Request permissions for this article.

    Check for updates

    Author Tags

    1. knowledge compilation
    2. provenance
    3. shapley value


    • Research-article

    Funding Sources



    Acceptance Rates

    Overall Acceptance Rate 785 of 4,003 submissions, 20%


    Other Metrics

    Bibliometrics & Citations


    Article Metrics

    • Downloads (Last 12 months)167
    • Downloads (Last 6 weeks)7
    Reflects downloads up to 27 Jul 2024

    Other Metrics


    Cited By

    View all
    • (2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
    • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
    • (2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
    • (2024)When is Shapley Value Computation a Matter of Counting?Proceedings of the ACM on Management of Data10.1145/36516062:2(1-24)Online publication date: 14-May-2024
    • (2024)From Shapley Value to Model Counting and BackProceedings of the ACM on Management of Data10.1145/36511422:2(1-23)Online publication date: 14-May-2024
    • (2024)Applications and Computation of the Shapley Value in Databases and Machine LearningCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654680(630-635)Online publication date: 9-Jun-2024
    • (2023)The Shapley Value in Database ManagementACM SIGMOD Record10.1145/3615952.361595452:2(6-17)Online publication date: 11-Aug-2023
    • (2023)Efficient Sampling Approaches to Shapley Value ApproximationProceedings of the ACM on Management of Data10.1145/35887281:1(1-24)Online publication date: 30-May-2023
    • (2023)On Explaining Confounding Bias2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00144(1846-1859)Online publication date: Apr-2023
    • (2023)Dynamic Shapley Value Computation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00055(639-652)Online publication date: Apr-2023

    View Options

    Get Access

    Login options

    View options


    View or Download as a PDF file.



    View online with eReader.








    Share this Publication link

    Share on social media