research-article

Computing the Shapley Value of Facts in Query Answering

Authors:

Benny Kimelfeld,

Mikaël MonetAuthors Info & Claims

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

Pages 1570 - 1583

https://doi.org/10.1145/3514221.3517912

Published: 11 June 2022 Publication History

Abstract

The Shapley value is a game-theoretic notion for wealth distribution that is nowadays extensively used to explain complex data-intensive computation, for instance, in network analysis or machine learning. Recent theoretical works show that query evaluation over relational databases fits well in this explanation paradigm. Yet, these works fall short of providing practical solutions to the computational challenge inherent to the Shapley computation. We present in this paper two practically effective solutions for computing Shapley values in query answering. We start by establishing a tight theoretical connection to the extensively studied problem of query evaluation over probabilistic databases, which allows us to obtain a polynomial-time algorithm for the class of queries for which probability computation is tractable. We then propose a first practical solution for computing Shapley values that adopts tools from probabilistic query evaluation. In particular, we capture the dependence of query answers on input database facts using Boolean expressions (data provenance), and then transform it, via Knowledge Compilation, into a particular circuit form for which we devise an algorithm for computing the Shapley values. Our second practical solution is a faster yet inexact approach that transforms the provenance to a Conjunctive Normal Form and uses a heuristic to compute the Shapley values. Our experiments on TPC-H and IMDB demonstrate the practical effectiveness of our solutions.

References

[1]

Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. http://webdam.inria.fr/Alice/

Digital Library

[2]

Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 a. On the complexity of SHAP-score-based explanations: Tractability via knowledge compilation and non-approximability results. arXiv preprint (2021). https://arxiv.org/abs/2104.08015

[3]

Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021 b. The tractability of SHAP-score-based explanations over deterministic and decomposable Boolean circuits. In Proceedings of AAAI. https://arxiv.org/abs/2007.14045

[4]

Peter Buneman, James Cheney, Wang-Chiew Tan, and Stijn Vansummeren. 2008. Curated databases. In Proceedings of PODS. 1--12. https://homepages.inf.ed.ac.uk/opb/papers/inv.pdf

Digital Library

[5]

Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and where: A characterization of data provenance. In ICDT. Springer, 316--330. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1209&context=cis_papers

Digital Library

[6]

Yingwei Cui, Jennifer Widom, and Janet L Wiener. 2000. Tracing the lineage of view data in a warehousing environment. ACM Transactions on Database Systems (TODS), Vol. 25, 2 (2000), 179--227. http://ilpubs.stanford.edu:8090/252/1/1997--3.pdf

Digital Library

[7]

Nilesh Dalvi and Dan Suciu. 2007. Efficient query evaluation on probabilistic databases. VLDB J., Vol. 16, 4 (2007), 523--544. https://homes.cs.washington.edu/ suciu/vldbj-probdb.pdf

Digital Library

[8]

Nilesh Dalvi and Dan Suciu. 2013. The dichotomy of probabilistic inference for unions of conjunctive queries. Journal of the ACM (JACM), Vol. 59, 6 (2013), 1--87. https://homes.cs.washington.edu/ suciu/jacm-dichotomy.pdf

Digital Library

[9]

Adnan Darwiche. 2001. On the tractable counting of theory models and its application to truth maintenance and belief revision. J. Applied Non-Classical Logics, Vol. 11, 1--2 (2001). https://arxiv.org/abs/cs/0003044

[10]

Adnan Darwiche. 2004. New advances in compiling CNF to decomposable negation normal form. In Proceedings of ECAI. Citeseer, 328--332. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.178.2262

[11]

Adnan Darwiche and Pierre Marquis. 2002. A knowledge compilation map. Journal of Artificial Intelligence Research, Vol. 17 (2002), 229--264. https://arxiv.org/abs/1106.1819

[12]

Daniel Deutch, Nave Frost, and Amir Gilad. 2020. Explaining natural language query results. The VLDB Journal, Vol. 29, 1 (2020), 485--508. https://arxiv.org/abs/2007.04454

Digital Library

[13]

Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaël Monet. 2021. Shapley for database facts source code. https://github.com/navefr/ShapleyForDbFacts .

[14]

Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance semirings. In Proceedings of PODS. 31--40. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research

Digital Library

[15]

Todd J Green and Val Tannen. 2017. The semiring framework for database provenance. In Proceedings of PODS. 93--99. https://dl.acm.org/doi/10.1145/3034786.3056125

Digital Library

[16]

Anthony Hunter and Sébastien Konieczny. 2010. On the measure of conflicts: Shapley inconsistency values. Artificial Intelligence, Vol. 174, 14 (2010), 1007--1026. http://www.cril.univ-artois.fr/ konieczny/papers/aij10a.pdf

Digital Library

[17]

Tomasz Imielinski and Witold Lipski Jr. 1984. Incomplete Information in Relational Databases. J. ACM, Vol. 31, 4 (1984), 761--791. https://doi.org/10.1145/1634.1886

Digital Library

[18]

Abhay Jha and Dan Suciu. 2013. Knowledge compilation meets database theory: compiling queries to decision diagrams. Theory of Computing Systems, Vol. 52, 3 (2013), 403--440. https://link.springer.com/article/10.1007/s00224-012--9392--5

Digital Library

[19]

Viktor Leis, Andrey Gubichev, Atanas Mirchev, Peter Boncz, Alfons Kemper, and Thomas Neumann. 2015. How good are query optimizers, really? Proceedings of the VLDB Endowment, Vol. 9, 3 (2015), 204--215. https://www.vldb.org/pvldb/vol9/p204-leis.pdf

Digital Library

[20]

Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley value of tuples in query answering. In ICDT, Vol. 155. Schloss Dagstuhl, 20:1--20:19. https://arxiv.org/abs/1904.08679

[21]

Scott M Lundberg, Gabriel Erion, Hugh Chen, Alex DeGrave, Jordan M Prutkin, Bala Nair, Ronit Katz, Jonathan Himmelfarb, Nisha Bansal, and Su-In Lee. 2020. From local explanations to global understanding with explainable AI for trees. Nature machine intelligence, Vol. 2, 1 (2020), 2522--5839. https://arxiv.org/pdf/1905.04610.pdf

[22]

Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. In Advances in neural information processing systems. 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf

[23]

Irwin Mann and LS Shapley. 1960. Values for large games, IV: Evaluating the Electoral College by Monte Carlo Techniques. The Rand Corporation. Research Memorandum, Vol. 2651 (1960). https://www.rand.org/pubs/research_memoranda/RM2651.html

[24]

Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The complexity of causality and responsibility for query answers and non-answers. PVLDB, Vol. 4, 1 (2010), 34--45. https://www.vldb.org/pvldb/vol4/p34-meliou.pdf

Digital Library

[25]

Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and explanations in databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 7, 13 (2014), 1715--1716. http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf

Digital Library

[26]

Mikaël Monet. 2020. Solving a Special Case of the Intensional vs Extensional Conjecture in Probabilistic Databases. In Proceedings of PODS. 149--163. https://arxiv.org/abs/1912.11864

Digital Library

[27]

Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The impact of negation on the complexity of the Shapley value in conjunctive queries. In Proceedings of PODS. 285--297. https://arxiv.org/abs/1912.12610

Digital Library

[28]

Alvin E Roth. 1988. The Shapley Value: Essays in Honor of Lloyd S. Shapley .Cambridge University Press. http://www.library.fa.ru/files/Roth2.pdf

[29]

Sudeepa Roy, Laurel J. Orr, and Dan Suciu. 2015. Explaining query answers with explanation-ready databases. Proceedings of the VLDB Endowment (PVLDB), Vol. 9, 4 (2015), 348--359. http://www.vldb.org/pvldb/vol9/p348-roy.pdf

Digital Library

[30]

Babak Salimi, Leopoldo E. Bertossi, Dan Suciu, and Guy Van den Broeck. 2016. Quantifying causal effects on query answering in databases. In TaPP. USENIX Association. http://web.cs.ucla.edu/ guyvdb/papers/SalimiTaPP16.pdf

[31]

Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. Provsql: Provenance and probability management in postgresql. Proceedings of the VLDB Endowment (PVLDB), Vol. 11, 12 (2018), 2034--2037. https://hal.inria.fr/hal-01851538/file/p976-senellart.pdf

Digital Library

[32]

Lloyd S Shapley. 1953. A value for n-person games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317. http://www.library.fa.ru/files/Roth2.pdf#page=39

[33]

Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases .Morgan & Claypool. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016

[34]

Transaction Processing Performance Council (TPC). 2017. hrefhttp://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdfTPC-H benchmark. http://www.tpc.org/tpc_documents_current_versions/pdf/tpc-h_v2.17.2.pdf

[35]

Grigori S Tseitin. 1983. On the complexity of derivation in propositional calculus. In Automation of reasoning. Springer, 466--483. https://link.springer.com/chapter/10.1007/978--3--642--81955--1_28

[36]

Guy Van den Broeck, Anton Lykov, Maximilian Schleich, and Dan Suciu. 2021. On the tractability of shap explanations. In Proceedings of AAAI. https://arxiv.org/abs/2009.08634

[37]

Moshe Y Vardi. 1982. The complexity of relational query languages. In STOC. ACM, 137--146. http://www.dis.uniroma1.it/ degiacom/didattica/semingsoft/SIS05-06/materiale/1-query-congiuntive/riferimenti/vardi-1982.pdf

[38]

Yining Wang, Liwei Wang, Yuanzhi Li, Di He, Wei Chen, and Tie-Yan Liu. 2013. A theoretical analysis of NDCG ranking measures. In Proceedings of COLT, Vol. 8. 6. https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.490&rep=rep1&type=pdf

[39]

Bruno Yun, Srdjan Vesic, Madalina Croitoru, and Pierre Bisquert. 2018. Inconsistency Measures for Repair Semantics in OBDA. In IJCAI. ijcai.org, 1977--1983. https://www.ijcai.org/proceedings/2018/0273.pdf

Cited By

Xia HLi XPang JLiu JRen KXiong L(2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654638
Bertossi LAzua F(2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669843
Abramovich ODeutch DFrost NKara AOlteanu D(2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654926
Show More Cited By

Index Terms

Computing the Shapley Value of Facts in Query Answering
1. Information systems
  1. Data management systems
    1. Database design and models
      1. Data model extensions
        Data provenance
      2. Relational database model

Recommendations

Efficient Sampling Approaches to Shapley Value Approximation
PACMMOD

Shapley value provides a unique way to fairly assess each player's contribution in a coalition and has enjoyed many applications. However, the exact computation of Shapley value is #P-hard due to the combinatoric nature of Shapley value. Many existing ...
Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic Databases
PODS

Shapley values, originating in game theory and increasingly prominent in explainable AI, have been proposed to assess the contribution of facts in query answering over databases, along with other similar power indices such as Banzhaf values. In this work ...
Coalition-weighted Shapley values
Abstract
We introduce a new class of values for coalitional games: the coalition-weighted Shapley values. Weights can be assigned to coalitions, not just to players, and zero-weights are admissible. The Shapley value belongs to this class. Coalition-...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

SIGMOD '22: Proceedings of the 2022 International Conference on Management of Data

June 2022

2597 pages

ISBN:9781450392495

DOI:10.1145/3514221

General Chair:
Zachary Ives
University of Pennsylvania (USA)
,
Program Chairs:
Angela Bonifati
Lyon 1 University (France)
,
Amr El Abbadi
University of California, Santa Barbara (USA)

Copyright © 2022 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOD: ACM Special Interest Group on Management of Data

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 June 2022

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Conference

SIGMOD/PODS '22

Sponsor:

SIGMOD

SIGMOD/PODS '22: International Conference on Management of Data

June 12 - 17, 2022

PA, Philadelphia, USA

Acceptance Rates

Overall Acceptance Rate 785 of 4,003 submissions, 20%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

10
Total Citations
View Citations
408
Total Downloads

Downloads (Last 12 months)167
Downloads (Last 6 weeks)7

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xia HLi XPang JLiu JRen KXiong L(2024)P-Shapley: Shapley Values on Probabilistic ClassifiersProceedings of the VLDB Endowment10.14778/3654621.365463817:7(1737-1750)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.14778/3654621.3654638
Bertossi LAzua F(2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3665601.3669843
Abramovich ODeutch DFrost NKara AOlteanu D(2024)Banzhaf Values for Facts in Query AnsweringProceedings of the ACM on Management of Data10.1145/36549262:3(1-26)Online publication date: 30-May-2024
https://dl.acm.org/doi/10.1145/3654926
Bienvenu MFigueira DLafourcade P(2024)When is Shapley Value Computation a Matter of Counting?Proceedings of the ACM on Management of Data10.1145/36516062:2(1-24)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3651606
Kara AOlteanu DSuciu D(2024)From Shapley Value to Model Counting and BackProceedings of the ACM on Management of Data10.1145/36511422:2(1-23)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3651142
Luo XPei JBarcelo PSanchez-Pi NMeliou ASudarshan S(2024)Applications and Computation of the Shapley Value in Databases and Machine LearningCompanion of the 2024 International Conference on Management of Data10.1145/3626246.3654680(630-635)Online publication date: 9-Jun-2024
https://dl.acm.org/doi/10.1145/3626246.3654680
Bertossi LKimelfeld BLivshits EMonet M(2023)The Shapley Value in Database ManagementACM SIGMOD Record10.1145/3615952.361595452:2(6-17)Online publication date: 11-Aug-2023
https://dl.acm.org/doi/10.1145/3615952.3615954
Zhang JSun QLiu JXiong LPei JRen K(2023)Efficient Sampling Approaches to Shapley Value ApproximationProceedings of the ACM on Management of Data10.1145/35887281:1(1-24)Online publication date: 30-May-2023
https://dl.acm.org/doi/10.1145/3588728
Youngmann BCafarella MMoskovitch YSalimi B(2023)On Explaining Confounding Bias2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00144(1846-1859)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00144
Zhang JXia HSun QLiu JXiong LPei JRen K(2023)Dynamic Shapley Value Computation2023 IEEE 39th International Conference on Data Engineering (ICDE)10.1109/ICDE55515.2023.00055(639-652)Online publication date: Apr-2023
https://doi.org/10.1109/ICDE55515.2023.00055

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents