Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

Banzhaf Values for Facts in Query Answering

Published: 30 May 2024 Publication History

Abstract

Quantifying the contribution of database facts to query answers has been studied as means of explanation. The Banzhaf value, originally developed in Game Theory, is a natural measure of fact contribution, yet its efficient computation for select-project-join-union queries is challenging. In this paper, we introduce three algorithms to compute the Banzhaf value of database facts: an exact algorithm, an anytime deterministic approximation algorithm with relative error guarantees, and an algorithm for ranking and top-k. They have three key building blocks: compilation of query lineage into an equivalent function that allows efficient Banzhaf value computation; dynamic programming computation of the Banzhaf values of variables in a Boolean function using the Banzhaf values for constituent functions; and a mechanism to compute efficiently lower and upper bounds on Banzhaf values for any positive DNF function.
We complement the algorithms with a dichotomy for the Banzhaf-based ranking problem: given two facts, deciding whether the Banzhaf value of one is greater than of the other is tractable for hierarchical queries and intractable for non-hierarchical queries.
We show experimentally that our algorithms significantly outperform exact and approximate algorithms from prior work, most times up to two orders of magnitude. Our algorithms can also cover challenging problem instances that are beyond reach for prior work.

References

[1]
Serge Abiteboul, Richard Hull, and Victor Vianu. 1995. Foundations of Databases. Vol. 8. Addison-Wesley Reading. http://webdam.inria.fr/Alice/
[2]
Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, and Dan Olteanu. 2023 a. Banzhaf Values for Facts in Query Answering. arxiv: 2308.05588 [cs.DB] Extended Version.
[3]
Omer Abramovich, Daniel Deutch, Nave Frost, Ahmet Kara, and Dan Olteanu. 2023 b. GitHub Repository. https://github.com/Omer-Abramovich/AdaBan.
[4]
Dana Arad, Daniel Deutch, and Nave Frost. 2022. LearnShapley: Learning to Predict Rankings of Facts Contribution Based on Query Logs. In Proc. of CIKM. 4788--4792.
[5]
Marcelo Arenas, Pablo Barceló, Leopoldo Bertossi, and Mikaël Monet. 2021. The Tractability of SHAP-Score-Based Explanations over Deterministic and Decomposable Boolean Circuits. In Proc. of AAAI. https://arxiv.org/abs/2007.14045
[6]
Marcelo Arenas, Pablo Barceló, Leopoldo E. Bertossi, and Mikaë l Monet. 2023. On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results. J. Mach. Learn. Res., Vol. 24 (2023), 63:1--63:58. http://jmlr.org/papers/v24/21-0389.html
[7]
J.F. Banzhaf. 1965. Weighted Voting Doesn't Work: A Mathematical Analysis. Rutgers Law Review, Vol. 19, 2 (1965), 317--343.
[8]
Leopoldo Bertossi, Benny Kimelfeld, Ester Livshits, and Mikaël Monet. 2023. The Shapley Value in Database Management. SIGMOD Rec., Vol. 52, 2 (2023).
[9]
Peter Buneman, Sanjeev Khanna, and Tan Wang-Chiew. 2001. Why and Where: A Characterization of Data Provenance. In Proc. of ICDT. Springer, 316--330.
[10]
Adriane Chapman and HV Jagadish. 2009. Why Not?. In Proc. of SIGMOD. 523--534.
[11]
James Cheney, Laura Chiticariu, and Wang-Chiew Tan. 2009. Provenance in Databases: Why, How, and Where. Foundations and Trends in Databases, Vol. 1, 4 (2009), 379--474. https://doi.org/10.1561/1900000006
[12]
Yingwei Cui, Jennifer Widom, and Janet L Wiener. 2000. Tracing the Lineage of View Data in a Warehousing Environment. ACM Trans. Datab. Syst., Vol. 25, 2 (2000), 179--227. http://ilpubs.stanford.edu:8090/252/1/1997--3.pdf
[13]
Radu Curticapean, Holger Dell, Fedor V. Fomin, Leslie Ann Goldberg, and John Lapinskas. 2019. A Fixed-Parameter Perspective on #BIS. Algorithmica, Vol. 81, 10 (2019), 3844--3864.
[14]
Adnan Darwiche and Pierre Marquis. 2002. A Knowledge Compilation Map. JAIR, Vol. 17 (2002), 229--264. https://arxiv.org/abs/1106.1819
[15]
Susan B. Davidson, Daniel Deutch, Nave Frost, Benny Kimelfeld, Omer Koren, and Mikaë l Monet. 2022. ShapGraph: An Holistic View of Explanations through Provenance Graphs and Shapley Values. In Proc. of SIGMOD. 2373--2376. https://doi.org/10.1145/3514221.3520172
[16]
Daniel Deutch, Nave Frost, Benny Kimelfeld, and Mikaë l Monet. 2022. Computing the Shapley Value of Facts in Query Answering. In Proc. of SIGMOD. 1570--1583.
[17]
Dennis Dosso, Susan B. Davidson, and Gianmaria Silvello. 2022. Credit Distribution in Relational Scientific Databases. Inf. Syst., Vol. 109 (2022), Article 102060.
[18]
Martin E. Dyer, Leslie Ann Goldberg, Catherine S. Greenhill, and Mark Jerrum. 2004. The Relative Complexity of Approximate Counting Problems. Algorithmica, Vol. 38, 3 (2004), 471--500.
[19]
Algaba E, Prieto A, and Saavedra-Nieves A. 2023. Risk Analysis Sampling Methods in Terrorist Networks based on the Banzhaf Value. Risk Analysis, Vol. 37210375 (May 2023). https://doi.org/10.1111/risa.14156
[20]
Vincent Feltkamp. 1995. Alternative Axiomatic Characterizations of the Shapley and Banzhaf Values. Int. J. Game Theory, Vol. 24 (1995), 179--186.
[21]
Robert Fink, Jiewen Huang, and Dan Olteanu. 2013. Anytime Approximation in Probabilistic Databases. VLDB J., Vol. 22, 6 (2013), 823--848. https://doi.org/10.1007/s00778-013-0310--5
[22]
Todd J Green, Grigoris Karvounarakis, and Val Tannen. 2007. Provenance Semirings. In Proc. of PODS. 31--40. https://repository.upenn.edu/cgi/viewcontent.cgi?article=1022&context=db_research
[23]
Todd J Green and Val Tannen. 2017. The Semiring Framework for Database Provenance. In Proc. of PODS. 93--99.
[24]
Melanie Herschel, Ralf Diestelk"a mper, and Houssem Ben Lahmar. 2017. A Survey on Provenance: What for? What form? What from? VLDB J., Vol. 26, 6 (2017), 881--906. https://doi.org/10.1007/s00778-017-0486--1
[25]
Adam Karczmarz, Tomasz P. Michalak, Anish Mukherjee, Piotr Sankowski, and Piotr Wygocki. 2022. Improved Feature Importance Computation for Tree Models based on the Banzhaf Value. In Proc. of UAI (PMLR, Vol. 180). PMLR, 969--979. https://proceedings.mlr.press/v180/karczmarz22a.html
[26]
Majd Khalil and Benny Kimelfeld. 2023. The Complexity of the Shapley Value for Regular Path Queries. In Proc. of ICDT (LIPIcs, Vol. 255). 11:1--11:19. https://doi.org/10.4230/LIPIcs.ICDT.2023.11
[27]
Werner Kirsch and Jessica Langner. 2010. Power Indices and Minimal Winning Coalitions. Soc. Choice Welf., Vol. 34, 1 (2010), 33--46. https://doi.org/10.1007/s00355-009-0387--3
[28]
Seokki Lee, Bertram Lud"a scher, and Boris Glavic. 2020. Approximate Summaries for Why and Why-not Provenance. Proc. of VLDB Endow. (PVLDB), Vol. 13, 6 (2020), 912--924. https://doi.org/10.14778/3380750.3380760
[29]
Ehud Lehrer. 1988. An Axiomatization of the Banzhaf Value. Int. J. Game Theory, Vol. 17 (1988), 89--99.
[30]
Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2020. The Shapley Value of Tuples in Query Answering. In Proc. of ICDT (LIPIcs, Vol. 155). 20:1--20:19. https://arxiv.org/abs/1904.08679
[31]
Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021a. Query Games in Databases. SIGMOD Rec., Vol. 50, 1 (2021), 78--85. https://doi.org/10.1145/3471485.3471504
[32]
Ester Livshits, Leopoldo E. Bertossi, Benny Kimelfeld, and Moshe Sebag. 2021b. The Shapley Value of Tuples in Query Answering. Log. Methods Comput. Sci., Vol. 17, 3 (2021). https://doi.org/10.46298/lmcs-17(3:22)2021
[33]
Scott M Lundberg and Su-In Lee. 2017. A Unified Approach to Interpreting Model Predictions. In Proc. of NeurIPS. 4765--4774. http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions.pdf
[34]
Alexandra Meliou, Wolfgang Gatterbauer, Katherine F. Moore, and Dan Suciu. 2010. The Complexity of Causality and Responsibility for Query Answers and non-Answers. Proc. of VLDB Endow. (PVLDB), Vol. 4, 1 (2010), 34--45.
[35]
Alexandra Meliou, Wolfgang Gatterbauer, and Dan Suciu. 2011. Bringing Provenance to Its Full Potential Using Causal Reasoning. In TaPP, Peter Buneman and Juliana Freire (Eds.). USENIX Association.
[36]
Alexandra Meliou, Sudeepa Roy, and Dan Suciu. 2014. Causality and Explanations in Databases. Proc. of VLDB Endow. (PVLDB), Vol. 7, 13 (2014), 1715--1716. http://www.vldb.org/pvldb/vol7/p1715-meliou.pdf
[37]
Zhengjie Miao, Qitian Zeng, Boris Glavic, and Sudeepa Roy. 2019a. Going Beyond Provenance: Explaining Query Answers with Pattern-based Counterbalances. In Proc. of SIGMOD. ACM, 485--502. https://doi.org/10.1145/3299869.3300066
[38]
Zhengjie Miao, Qitian Zeng, Chenjie Li, Boris Glavic, Oliver Kennedy, and Sudeepa Roy. 2019b. CAPE: Explaining Outliers by Counterbalancing. Proc. VLDB Endow., Vol. 12, 12 (2019), 1806--1809. https://doi.org/10.14778/3352063.3352071
[39]
Dan Olteanu and Jiewen Huang. 2008. hrefhttps://www.cs.ox.ac.uk/people/dan.olteanu/papers/oh-sum08.pdfUsing OBDDs for Efficient Query Evaluation on Probabilistic Databases. In Proc. of SUM. Springer, 326--340.
[40]
Dan Olteanu, Jiewen Huang, and Christoph Koch. 2010. Approximate Confidence Computation in Probabilistic Databases. In Proc. of ICDE. 145--156.
[41]
Dan Olteanu and Hongkai Wen. 2012. Ranking Query Answers in Probabilistic Databases: Complexity and Efficient Algorithms. In Proc. of ICDE. 282--293. https://doi.org/10.1109/ICDE.2012.61
[42]
L. S. Penrose. 1946. The Elementary Statistics of Majority Voting. Journal of the Royal Statistical Society, Vol. 109, 1 (1946), 53--57. http://www.jstor.org/stable/2981392
[43]
Alon Reshef, Benny Kimelfeld, and Ester Livshits. 2020. The Impact of Negation on the Complexity of the Shapley Value in Conjunctive Queries. In Proc. of PODS. 285--297. https://doi.org/10.1145/3375395.3387664
[44]
Mustapha Ridaoui, Michel Grabisch, and Christophe Labreuche. 2018. An Axiomatisation of the Banzhaf Value and Interaction Index for Multichoice Games. In MDAI. https://shs.hal.science/halshs-02381119
[45]
Babak Salimi, Leopoldo E. Bertossi, Dan Suciu, and Guy Van den Broeck. 2016. Quantifying Causal Effects on Query Answering in Databases. In Proc. of TaPP. USENIX Association. http://web.cs.ucla.edu/ guyvdb/papers/SalimiTaPP16.pdf
[46]
Patrick Schober, Christa Boer, and Lothar A Schwarte. 2018. Correlation Coefficients: Appropriate Use and Interpretation. Anesthesia & analgesia, Vol. 126, 5 (2018), 1763--1768.
[47]
Pranab Kumar Sen. 1968. Estimates of the Regression Coefficient based on Kendall's tau. Journal of the American statistical association, Vol. 63, 324 (1968), 1379--1389.
[48]
Pierre Senellart, Louis Jachiet, Silviu Maniu, and Yann Ramusat. 2018. Provsql: Provenance and Probability Management in PostgreSQL. Proc. of VLDB Endow. (PVLDB), Vol. 11, 12 (2018), 2034--2037. https://hal.inria.fr/hal-01851538/file/p976-senellart.pdf
[49]
Lloyd S Shapley. 1953. A Value for n-Person Games. Contributions to the Theory of Games, Vol. 2, 28 (1953), 307--317. http://www.library.fa.ru/files/Roth2.pdf#page=39
[50]
Philip D Strafiin Jr. 1988. The Shapley-Shubik and Banzhaf Power Indices as Probabilities. The Shapley value: essays in honor of Lloyd S. Shapley (1988), 71.
[51]
Dan Suciu, Dan Olteanu, Christopher Ré, and Christoph Koch. 2011. Probabilistic Databases. Morgan & Claypool. https://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016
[52]
Jianyuan Sun, Guoqiang Zhong, Kaizhu Huang, and Junyu Dong. 2018. Banzhaf Random Forests: Cooperative Game Theory Based Random Forests with Consistency. Neural Networks, Vol. 106 (2018), 20--29.
[53]
Rene Van den Brink and Gerard Van der Laan. 1998. Axiomatizations of the Normalized Banzhaf Value and the Shapley Value. Social Choice and Welfare, Vol. 15 (1998), 567--582.
[54]
Diego Varela and Javier Prado-Dominguez. 2012. Negotiating the Lisbon Treaty: Redistribution, Efficiency and Power Indices. Czech Economic Review, Vol. 6, 2 (July 2012), 107--124.
[55]
Jiachen T. Wang and Ruoxi Jia. 2023. Data Banzhaf: A Robust Data Valuation Framework for Machine Learning. In Proc. of AISTATS. 6388--6421. io

Cited By

View all
  • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
  • (2024)Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic DatabasesProceedings of the ACM on Management of Data10.1145/36515932:2(1-26)Online publication date: 14-May-2024

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Proceedings of the ACM on Management of Data
Proceedings of the ACM on Management of Data  Volume 2, Issue 3
SIGMOD
June 2024
1953 pages
EISSN:2836-6573
DOI:10.1145/3670010
Issue’s Table of Contents
This work is licensed under a Creative Commons Attribution International 4.0 License.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 30 May 2024
Published in PACMMOD Volume 2, Issue 3

Author Tags

  1. banzhaf values
  2. explanations
  3. lineage

Qualifiers

  • Research-article

Funding Sources

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)227
  • Downloads (Last 6 weeks)36
Reflects downloads up to 25 Dec 2024

Other Metrics

Citations

Cited By

View all
  • (2024)The Generalized Causal-Effect Score in Data Management (short paper)Proceedings of the Conference on Governance, Understanding and Integration of Data for Effective and Responsible AI10.1145/3665601.3669843(32-35)Online publication date: 9-Jun-2024
  • (2024)Expected Shapley-Like Scores of Boolean functions: Complexity and Applications to Probabilistic DatabasesProceedings of the ACM on Management of Data10.1145/36515932:2(1-26)Online publication date: 14-May-2024

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Full Access

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media